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How lizards got their big feet 


Creatures that survived wild weather in the Caribbean offer a rare and enlightening glimpse of 


natural selection in action. 


hen Charles Darwin first saw a Madagascar star orchid — 
W= to him by an enthusiast — he predicted the existence of 

a long-tongued pollinator that could reach for nectar inside 
the flowers’ long tubes. The discovery of Morgan’s sphinx moth, which 
had a tongue just long enough (and no longer), proved Darwin right — 
some two decades after his death. 

It’s one of the great demonstrations of evolution by natural selec- 
tion. But what naturalists really want is to catch natural selection in 
the act, as it’s doing the selecting. And opportunity knocked at the 
door of biologist Colin Donihue at Harvard University in Cambridge, 
Massachusetts, a year ago, just after he and his colleagues had returned 
from the Turks and Caicos Islands, where they had been studying anole 
lizards (Anolis scriptus). 

On 8 September last year, Hurricane Irma struck the islands, 
battering them with sustained winds of up to 265 kilometres per hour. 
Two weeks later, Hurricane Maria arrived. Dozens of people in the 
region died. Reconstruction and rebuilding efforts continue. 

Three weeks after the winds had died down, the researchers went 
back to assess the damage, and to see how (or even if) their lizards had 
survived. Their study, published this week in Nature, is the first to use an 
immediate before-and-after comparison to assess the impacts of hurri- 
canes on evolutionary selection (C. M. Donihue et al. Nature http://doi. 
org/csgp; 2018). They saw clear trends of natural selection in action. In 
general, anoles found after the storms had bigger toepads, longer fore- 
limbs and shorter hindlimbs than did lizards collected before the storm. 

What have these traits to do with hurricanes? The lizards live in 
bushes and other low-growing vegetation. Toepads allow them purchase 


on the branches as they move, and it’s a fair bet that limb proportions 
also play a part in helping a lizard to keep in contact with a branch, 
resisting moves by predators, other lizards — or, as it turned out, 
hurricanes — to knock them off. 

The researchers took their idea further with a simple laboratory 
experiment in which they allowed lizards to get settled on a perch and 
then blew them off using a commercial leaf blower. (The lizards flew 
into comfy padding, and were not injured in the experiments.) 

This experiment showed that when lizards are subjected to a stiff 
breeze, they hang on tightly with their forelimbs and let their hindlimbs 
hang loose. Longer hindlimbs, then, offer more purchase, explaining 
why lizards found after the storms tended to have shorter hindlimbs, 
but longer forelimbs. 

These bigger toepads, shorter hindlimbs and longer forelimbs did 
not evolve as a direct response to the hurricanes. Natural selection 
interfered with the way in which these traits were spread across the 
population. Specifically, those lizards unable to hang on when the 
storms blew up — those with smaller toepads, longer hindlimbs and 
shorter forelimbs — were (presumably) blown away and perished. 
Those lizards better able to hang on tightly would have survived to 
weather another day. In technical terms, the mean values of the crucial 
traits measured before the storms had shifted. 

These changes are phenotypic — merely observable characteristics. 
They say nothing about the genetic assimilation of such changes, which 
will presumably happen when the surviving lizards breed and new 
lizards are recruited to the population. Such changes are unlikely to be 
the last, given the extreme weather expected in the future. m 


Hot topic 


Pinning extreme weather on climate change is 
now routine and reliable science. 


question is being asked: is climate change to blame? 

For years, the standard response was that climate change 
makes such events more likely, but it is hard to pin down the causes of 
a particular event. That is now changing. 

As we highlight in a News Feature this week (see page 20), extreme- 
event attribution — the science of calculating how global warming 
has changed the likelihood and magnitude of extreme heat, cold, 
drought, rain or flooding — is ready to leave the lab. The research 
has advanced to the point at which public agencies can take over 
the task. That's down to progress in modelling, and more-accurate 
observations. 


Bes weather is grabbing public attention. And a familiar 


Rapid-attribution services are already being set up in Germany and 
elsewhere in Europe. Officials on other continents should follow suit. 

Climate-change attribution statements are not yet ready for 
prime-time weather reports. But attribution is not just for the media 
or public curiosity. Policymakers, risk managers and courts will have 
a new decision-making tool on hand for issues related to climate 
change — to help plan infrastructure and, ultimately, assess liability. 

Scientists should continue to improve the usefulness of attribution 
statements. There is no universal threshold beyond which climate 
change becomes dangerous, so research must establish resilience 
limits for different regions and sectors of the economy. To do this, 
and to help prepare a large city’s health system or a region’s agriculture 
for more turbulent weather, climate researchers must join forces with 
economists, social scientists and local specialists. 

As attribution proceeds from science to service, the researchers who 
developed it can return to climate physics, such as exploring how clouds 
and climate interact, and how large-scale atmospheric circulation will 
change ina warming world. This is needed to continue checking the out- 
put of the models and to make attribution more precise. That can only 
help to highlight that our world is changing and we need to respond. = 
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WORLD VIEW jernssicnoen 


Hobart, Australia, the city where I live and work as a pyrogeogra- 

pher. A newspaper photograph captured the terror of the firestorm: 
in it, a family shelters under a jetty in surreal, smoky orange twilight. 
Last week, spookily similar images from Greece, where people fled to 
the sea to escape an inferno, hit me with a jolt. 

Fires have ignited elsewhere, including in Sweden and California, 
in recent weeks. But we can say little for certain about trends in wild- 
fires worldwide. Data are too scant to say conclusively whether fires 
are becoming more destructive. If humans are to live sustainably on 
flammable landscapes, we needa global system for collecting data on 
fires to gain a coherent picture and assess strategies. 

My colleagues and I analysed records from 1979 to 2013 (W. M. Jolly 
et al. Nature Commun. 6, 7537; 2015). We found 
that fire seasons worldwide are lengthening, and 
that ‘fire weather’ — the combination of humid- 
ity, temperature, wind and other factors that help 
blazes to spread — is becoming more extreme. 
Yet the strong links between humans and flam- 
mable landscapes make fire a natural hazard like 
no other. We can both amplify and dampen the 
cycle by setting fires and fighting them. So alonger 
fire season does not necessarily mean more fires. 

Satellite imaging has revolutionized our 
understanding of fire activity: it has provided 
global data on areas burnt and on variations 
from season to season regarding when fires 
occur and how large they grow. But satellite 
imagery is imperfect. Time courses of high- 
resolution images are separated by days, weeks 
or even months, if clouds cover the fires or patches of burnt ground. 
Geostationary satellites, which have higher orbits than other types, can 
provide real-time information, but only at coarse resolution. 

In addition, satellite images became widely available only in the 1980s, 
so scientists know little about areas that burn infrequently. Instead, we 
rely on historical proxies, such as dating fire scars on tree trunks and 
examining charcoal layers in lake sediments. 

Satellite analysis suggests that the total area burnt by wildfires 
over the past 18 years has declined (N. Andela et al. Science 356, 
1356-1362; 2017). This seems to be, in part, because much of the 
world’s tropical savanna has been converted to ranches and cropland 
that fragments landscapes and makes them less flammable. For disas- 
ters, however, area burnt is usually less important than fire intensity, as 
determined by estimates of the energy that fires release. We've found 
that disastrous fires typically occur in regions with medium popula- 
tion densities, and at times of anomalously high heat, wind or dryness 
(D. M. J. S. Bowman et al. Nature Ecol. Evol. 1, 0058; 2017). 

Even a low-intensity fire can have a severe biological impact, espe- 
cially in areas, such as rainforest, that are poorly adapted to handle it. 
And reliably assessing severity requires field work to collect data that 


E 2013, a ferocious fire destroyed a small seaside village east of 
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Wildfire science is at a loss 
for comprehensive data 


An international monitoring initiative is crucial for understanding wildfires 
and reducing their damage, says David Bowman. 


might otherwise be missed. For example, fires under forest canopies 
are generally invisible to satellites. 

To gain a comprehensive view, we need an initiative similar to the 
confederation of national meteorological networks that monitors daily 
weather conditions. This global network, established over the nine- 
teenth and twentieth centuries, is an underappreciated scientific tri- 
umph. Its data are the backbone of weather forecasting and calibrating 
climate-change models. Imagine how unreliable weather projections 
would be if based only on satellite data, spotty field measurements and 
historical reconstructions — as is the case for landscape fires. 

Comparative analysis of fire activity became possible in Europe only 
in 2004, with the advent of the European Union's fire database, which 
contains data from 22 nations. Few countries maintain records of fires 
stretching further back in time. 

A global clearing house that monitors landscape 
fires could record the types of vegetation burnt, 
how severely, over what area and with what loss 
of life and property. We need these data to facili- 
tate analyses of causes and effects, evaluate fire- 
management approaches and guide re-insurance 
rates. They would let us predict how well burning 
or not burning vegetation might modulate levels 
of carbon dioxide and other greenhouse gases, and 
to assess how climate change, land management 
and socio-economic policies influence fire vulner- 
ability and activity. 

Creating a global database will pose a huge pol- 
icy and technical challenge. But so was building 
meteorological stations. Some of the elements for 
such a database already exist in EU and national 
databases. As well as remote sensing, on-the-ground observations are 
essential for calibrating and validating these data streams. We could 
enlist citizen science: smartphones with Global Positioning System 
access could document biological impacts, and platforms such as 
Google Earth could help to merge and manage data. 

Achieving a global understanding of fire activity will require a major 
international initiative, and will probably need to be led by an author- 
ity such as the World Bank or the United Nations. Although difficult 
and complicated, the rewards would be invaluable; other projects of 
this scale reveal the potential: witness the impact of the World Health 
Organization, and its assessment of the global burden of diseases. 

Without improved mapping and monitoring, we will remain unable 
to answer the most basic questions about trends in wildfires. Acting 
in ignorance is a poor way to honour — let alone decrease — the lives 
lost to fires each year. = 


David Bowman is a pyrogeographer at the University of Tasmania in 
Hobart, Australia, and studies the biophysical and human dimensions 


of fire. 


e-mail: david.bowman@utas.edu.au 
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Ketamine trial 

A consumer-advocacy group 
filed a complaint with the US 
government on 25 July about 
two clinical trials in Minnesota 
that allegedly gave agitated 
patients ketamine and other 
sedatives without their consent 
and despite evidence that doing 
so could harm their health. 
Researchers at Hennepin 
County Medical Center 
(HCMC) in Minneapolis 
conducted the trials between 
2014 and June 2018. The 
advocacy group, Public Citizen 
of Washington DC, submitted 
the complaint to the US 

Office for Human Research 
Protections and the Food 

and Drug Administration. 
Sixty-four doctors, bioethicists 
and academic researchers 
co-signed the group’s letter. A 
spokesperson for Hennepin 
Healthcare, which operates the 
HCMC, declined to comment 
on the studies until after 
ongoing internal and external 
investigations are complete. 
See go.nature.com/2vajebp for 
more. 


Embryo editing 

Most US adults think that 

it would be appropriate to 

use gene editing on human 
embryos to reduce the risk of 
developing a serious disease, 
according to a survey of 

2,537 people released on 

26 July. The survey, conducted 
by the Pew Research Center in 
Washington DC, found that 
religion had a strong influence 
on responses. Almost 
three-quarters of those who 
reported low levels of religious 
commitment supported gene 
editing in embryos to reduce 
the risk of disease later in 

life, compared to only 46% 

of those with high religious 
commitment. On average, 
men were more supportive 
than women of gene editing in 
embryos, and people with high 
levels of science knowledge 


Volcanic eruption prompts evacuation 


The Vanuatu government ordered all 

11,000 residents of Ambae island to evacuate 
on 27 July because of the erupting Manaro Voui 
volcano. Ash plumes have risen more than 

5 kilometres, darkening the sky and making 
travel difficult as people prepared to move 

to neighbouring islands. The Vanuatu 
meteorology and geo-hazards department 


were more comfortable with 
the idea than were those with 
less. Only 19% of respondents 
thought it appropriate to 

use gene editing to improve 
intelligence. 


Stem-cell advance 


Doctors in Japan are poised to 
implant neural cells made from 
‘reprogrammed stem cells 

into people with Parkinson's 
disease. It is only the third 
clinical application of induced 
pluripotent stem (iPS) cells, 
which are developed by 
inducing the cells of body 
tissues such as skin to revert to 
an embryonic-like state, from 
which they can morph into 
other cell types. Researchers 
have used the technique to 
generate precursors to cells 
that make the neurotransmitter 
dopamine, which degenerate 
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and die in people with 
Parkinson's disease. Physicians 
at Kyoto University Hospital 
will inject 5 million of these 
precursor cells into the brains 
of seven people with the 
condition. Because dopamine- 
producing neurons are 
involved in motor skills, people 
with the condition typically 
experience tremors and stiff 
muscles. Participants will be 
observed for two years after 
the transplantation. One of 
the trial’s leaders, stem-cell 
scientist Jun Takahashi at the 
Center for iPS Cell Research 
and Application in Kyoto, 
demonstrated in 2017 that the 
therapy improved symptoms 
in monkeys that had a version 
of the disease (T. Kikuchi et al. 
Nature 548, 592-596; 2017). 
See go.nature.com/2kcgbap for 
more. 


© 2018 Springer Nature Limited. All rights reserved. 


warned of the health risks of breathing 
volcanic ash and gas. Manaro Voui is the most 
voluminous volcano in the South Pacific’s 

New Hebrides archipelago, and an eruption last 
September forced an island-wide evacuation 
then, too. It has continued to sporadically belch 
out ash, triggering several partial evacuations 
earlier this year. 


Telescope boon 


A French consortium has 
become the twelfth member 
of the Square Kilometre Array 
(SKA), an effort to build the 
world’s largest radio telescope. 
The SKA will eventually consist 
of thousands of radio dishes 
in Africa and up to 1 million 
in Australia; construction is 
expected to begin in 2020. 
Last year, the coordinating 
organization scaled back its 
design for the first phase of 
the SKA to keep it within 

a cost cap of €674 million 
(US$790 million). Project 
leaders hope that the addition 
of members, who pay an 
undisclosed fee, will allow for 
construction of the original 
design. Spain joined the 

effort in June. The Maison 


VANUATU RED CROSS VIA EPA/EFE 


> SKA-France consortium is the 
a first public-private group to 

& join the SKA and is headed by 
the country’s largest research 
agency, the CNRS. The 
consortium includes four other 
research organizations and 
seven private companies. The 
SKA’ other members comprise 
11 nation states, including 
China and India. 


HEALTH 


Ebola outbreak ends 
An outbreak of the Ebola virus 
that began in the Democratic 
Republic of the Congo (DRC) 
in early May was declared 
over by the country’s health 
ministry on 24 July. A total of 
54 people were infected, 33 of 
whom died. The outbreak 
was centred in remote rural 
regions of the northwestern 
Equateur Province, but 

cases were also reported 

in Mbandaka, a city on the 
banks of the Congo River, a 
major transport route. That 
prompted fears that the 
disease could spread quickly 
both in the city and to other 
regions. The outbreak is the 
DRC’s ninth; the first known 
outbreak of Ebola virus 
occurred in the nation in 
1976. Learning from mistakes 
in the large 2014-16 Ebola 
epidemic in West Africa, 
international public-health 
bodies quickly gave the DRC 


JUNIOR D. KANNAH 


according to a 26 July study 
(K. R. Jones et al. Curr. Biol. 
https://doi.org/csh5; 2018). 


support — including helping 
to deploy several thousand 
doses of an experimental 
Ebola vaccine (pictured). 


Fetal trial stopped 


A clinical trial in the 
Netherlands designed to test 
whether sildenafil (Viagra) 
could treat a fetal growth 
condition has been cancelled 
after an interim assessment 
found that the therapy caused 
harm. The randomized trial 
treated 90 pregnant women 
with sildenafil and 93 with a 
placebo. Seventeen fetuses in 
the treatment group developed 
a severe lung disease and 11 of 
them died from the disease; 
only 3 fetuses in the control 
group developed the lung 
disease, and none of them 
died (fetuses in both groups 
also died from causes other 
than the lung disease). The 
growth condition is caused by 
insufficient blood oxygen and 
nutrient supply through the 
placenta, and carries a high 


TREND WATCH 


Only 13% of Earth’s oceans can 
now be classed as wilderness, 


around the poles, far from 
human populations. Coastal 
ecosystems — which include 
centres of biodiversity such as 


coral reefs — make up just 10% of 
the wilderness area. Of all marine 


SOURCE: K. R. JONES ETAL. CURR. BIOL. HTTPS://DOILORG/CSH5 (2018) 


Researchers used global data 
on 15 human stressors of oceans, 
including fishing, pollution and 
commercial shipping. Areas 
were defined as wilderness if they 
showed little impact from these 
stressors (scoring in the bottom 
10% ofa measure of each), as 
well as a low aggregate score that 
combined these human activities 
and climate stresses such as ocean 
acidification. 

Most wilderness areas 
are in the open ocean and 


wilderness, just 5% is covered by 
marine protected areas. 

“High seas at the moment area 
Wild West; says author Kendall 
Jones, a conservation biologist 
at the University of Queensland 
in Brisbane, Australia. “It really 
highlights that those last few areas 
that are not impacted — how 
important they are.” 


The United Nations is debating 


a high-seas conservation treaty, 
which should be signed by 2020. 


risk of fetal death or of brain 
damage in surviving babies. 
Researchers had hoped that 
sildenafil would dilate blood 
vessels in the placenta and 
improve fetal growth. The 
scientists running the trial 
notified Canadian researchers 
conducting a similar study, 
who have temporarily stopped 
their research. 


University lawsuit 


The latest in a string of lawsuits 
filed against the University of 
Southern California (USC) 

in Los Angeles claims that the 
university “suppressed and 
concealed” years of sexual- 
harassment complaints against 
former gynaecologist George 
Tyndall. The complaints from 
former and current students 
span nearly three decades, 

and include allegations that 
Tyndall “perpetrated serial 
sexual abuse, harassment, 
molestation, and violation’, 
according to one of the lawsuits 
filed on 23 July. After an 
internal investigation in 2016, 
the USC suspended Tyndall 
with pay, and he retired in 
2017. Californias Department 
of Education is currently 
investigating the university's 
response to the complaints. 

Ina statement to Nature, the 
USC says that it will seek a 
“prompt and fair resolution 


WILD OCEANS 


SEVEN DAYS | THIS WEEK | 


that is respectful of our former 
students”. Nature’s attempts to 
reach Tyndall and his lawyer 
were unsuccessful. 


POLICY 


Trial reporting 

The US National Institutes of 
Health (NIH) has postponed 
enforcement of a controversial 
tule that would have required 
behavioural studies to register 
as clinical trials. In a 20 July 
announcement, the agency said 
that until 24 September 2019, 
researchers performing basic 
trials with human participants 
would not be penalized for 
failing to list their studies on 
the government database 
Clinical Trials.gov. Brain and 
behavioural scientists had. 
pushed back against clinical- 
trial reporting policy that 

the NIH announced in 2016, 
which held non-invasive 
studies — such as cognitive 
tests and brain scans — to 

the same standards as drug 
trials. Numerous researchers 
argued that these rules were too 
burdensome and confusing. 

In its latest announcement, the 
NIH asked scientists to submit 
suggestions of how to make the 
policy more appropriate for 
basic research. 


> NATURE.COM 
For daily news updates see: 
WWww.nature.com/news 


Just 13% of Earth’s oceans should be considered wilderness, 
according to a study of global impacts on the seas. Marine protected 
areas encompass just 5% of the remaining wilderness. 


| Marine protected areas >100,000 km? 
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The open ocean supports 
a remote wilderness area 
in the Pacific. 


There are substantial 
wilderness areas in the 
Southern Ocean. 
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Vaccine scandal won’t 
put parents off immunizing 
their children p.14 


Scientists disappointed Efforts to gut US Assessing the 
by European court ruling on Endangered Species Act role of climate change in 
gene-edited plants p.16 face resistance p.17 natural disasters p.20 


Mars is thought to host a buried lake that could change how scientists want to explore the red planet. 


Si 


ns of buried lake on 


Mars tantalize scientists 


If confirmed, the lake would be the first body of liquid water ever detected on the red planet. 


BY ALEXANDRA WITZE 


large saltwater lake seems to lurk 
A“ ice near Mars’s south pole. If 

confirmed, it would be the first body 
of liquid water ever detected on the red planet, 
and a significant milestone in the quest to 
determine whether life exists there. 

“It’s a very promising place to look for life on 
Mars,’ says Roberto Orosei, a planetary scien- 
tist at the National Institute of Astrophysics in 
Bologna, Italy. “But we do not know for sure if 
it is inhabited.” On Earth, similar ‘subglacial’ 
lakes are home to microbial life. 


A team of Italian researchers, led by Orosei, 
reported the discovery on 25 July in Science’. 
They spotted evidence of the buried lake in 
radar data from the European Space Agency’s 
Mars Express spacecraft. 

Others say that the work is tantalizing but, 
like anything else in the controversial hunt 
for water on Mars, it needs more supporting 
evidence. “It’s not quite a slam dunk yet,’ says 
Jeffrey Plaut, a planetary scientist at NASA‘s 
Jet Propulsion Laboratory in Pasadena, 
California, who has searched for water using 
data from Mars Express’. 

If further studies confirm the existence of a 


lake, the discovery could open new avenues for 
investigating Mars. Researchers have drilled 
into subglacial lakes on Earth and sampled 
the water for signs of microbes, while others 
are developing technologies to reach a buried 
ocean on Jupiter’s moon Europa. There are no 
ice-drilling missions currently slated for Mars 
— but the latest discovery could change how 
scientists think about exploring the planet. 
“Tt begins a new line of inquiry that’s very 
exciting,” says Jim Green, NASA’ chief scien- 
tist. Water appears across Mars today in various 
forms, left over from a time billions of years 
ago when the planet was warmer and wetter. 
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Radar tracks on Mars’s Planum Australe show the location of a potential buried lake (in blue). 


> Orbiting probes have spotted ice, including 
buried glaciers, in many locations. Spacecraft 
have photographed steep slopes whose appear- 
ance changes seasonally, as if liquid water is 
running downhill and leaving dark marks. 
And NASA's Curiosity rover has measured 
water vapour in the planet’s atmosphere. 

Orosei and his colleagues found the lake 
using a radar instrument called MARSIS 
aboard Mars Express, which launched in 2003. 
It sends radio waves bouncing off the planet’s 
surface and subsurface layers; the way in which 
the radar signal reflects back reveals the type 
of material that is present, such as rock, ice or 
water. The scientists focused their search on 
the layers of ice and dust that cover the planet's 
south pole. 

But the observations were frustratingly 
inconsistent. Mars Express sometimes saw a 
bright reflection in several locations, which 
did not reappear the next time the spacecraft 
flew over those sites. Finally, in 2012, the sci- 
entists decided to get MARSIS to send back 
raw data, instead of performing automated 


processing before beaming the data to Earth. 
“This changed everything, and it was much 
more obvious to spot the bright reflectors,” 
says Orosei. 

The data showed the reflections coming 
from a 20-kilometre-long zone in a region 
known as Planum Australe. After ruling out 
other possible causes, such as carbon dioxide 
ice, the scientists concluded that the reflections 
were coming from subsurface water. 

The lake is about 1.5 kilometres beneath 
Mars’s icy surface and is at least 1 metre deep. 
To keep from freezing, the water must be very 
salty, Orosei says — perhaps similar to super- 
salty subglacial lakes reported in the Canadian 
Arctic earlier this year’, Salt-rich rocks beneath 
the Canadian lakes infuse the water and allow 
it to remain liquid, says Anja Rutishauser, 
a glaciologist at the University of Alberta in 
Edmonton. On Mars, salts known as per- 
chlorates might be making the brine; in 2008, 
NASA’s Phoenix spacecraft found perchlorates 
in soils near the planet’s northern polar ice. 

Mars might have had many similar lakes 


in the past, when heat rising from deep in 
the planet melted some of the ice covering 
its polar regions, says Stephen Clifford, a 
planetary scientist who proposed the idea* in 
1987 and now works for the Planetary Science 
Institute in Houston, Texas. If life once thrived 
in ancient subsurface lakes, he says, the latest 
finding “raises support for the idea that life 
could still persist on Mars”. 

With liquid water and the right chemical 
elements available to supply energy, a buried 
martian lake would have the ingredients needed 
to sustain life — as long as it’s not too salty, says 
John Priscu, a biogeochemist at Montana State 
University in Bozeman. But exploring it wont 
be easy. Priscu leads a team that aims to drill 
into Antarctica’s subglacial Lake Mercer later 
this year; hauling tonnes of equipment and fuel 
there required weeks of traversing the Antarctic 
ice sheet with tractors. “There's no way youre 
going to get all that to Mars,” he says. 

But there are ways to learn more with space- 
craft already in play. Green notes that NASA 
InSight probe, which is scheduled to land 
near the martian equator in November, will 
measure heat flow in the top 5 metres of the 
surface there. Scientists can use those data to 
extrapolate how much heat might be rising 
from beneath the south polar cap, melting the 
ice and potentially creating more lakes. 

Orosei says his team has glimpsed other 
bright reflections, but isn’t ready to say 
whether or not they are lakes. More studies 
using MARSIS, as well as the radar on board 
NASA’s Mars Reconnaissance Orbiter — which 
has looked at Planum Australe and not seen 
the reflections — could help to reveal whether 
these are actually liquid water or something 
else, Plaut says. m 


1. Orosei, R. et al. Science https://doi.org/10.1126/ 
science.aar7268 (2018). 

. Plaut, J. J. et al. Science 316, 92-95 (2007). 

. Rutishauser, A. et a/. Sci. Adv. 4, eaar4353 (2018). 

. Clifford, S. M. J. Geophys. Res. 92, 9135-9152 
(1987). 
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PUBLIC HEALTH 


China vaccine scandal unlikely 
to dent immunization rates 


Vaccines are compulsory for children starting schoolin China, and enjoy public support. 


BY NICKY PHILLIPS 


roblems with two Chinese-made 
Press — one of which was distrib- 
uted to clinics and possibly injected into 
hundreds of thousands of children — have led 
to arrests and made international headlines. 


14 | NATURE | VOL 560 | 2 AUGUST 2018 


But researchers who study vaccination in 
China dont expect a major effect on the coun- 
try’s high immunization rates. 

Widespread support for immunization 
programmes, combined with strict vaccine 
requirements for children starting school, 
means that most parents will continue 


vaccinating their children, they say. 

“T dort think there'll be an appreciable drop 
in vaccine coverage but it could impact when 
people get vaccines, and where the vaccines 
come from,” says Abram Wagner, an epidemi- 
ologist at the University of Michigan in Ann 
Arbor who has interviewed Chinese parents 


VCG/GETTY 


about their views on immunization. 

On 15 July, China's national drug watchdog 
revealed that during a surprise inspection 
of vaccine maker Changchun Changsheng 
Biotechnology’s facilities in the province of 
Jilin, it found that the company had faked pro- 
duction data for several batches of the rabies 
vaccine. Authorities ordered that the doses be 
disposed of and revoked the company’s manu- 
facturing permit for that vaccine; it is not clear 
whether anyone received the faulty doses. 

Five days later, local government investi- 
gators announced that the same company 
had violated standards in making about 
250,000 doses of the DTP vaccine, which 
protects against diphtheria, tetanus and 
pertussis (whooping cough), rendering the 
doses potentially ineffective. For that breach, 
the company says it was fined 3.44 million 
yuan (US$505,000). 

It is not known how many children received 
the faulty DTP vaccines, which were recalled 
when the problem was uncovered by authori- 
ties in November, but so far no health issues 
have been reported. The main concern is that 
these vaccines won't protect children from the 
dangerous infections that they're meant to 
combat, says Wagner. 

Parents turned to social media to voice their 
anger at the company and their concerns about 
domestic vaccines. But Xiaomin Wang, a social 
scientist at Zhejiang University Institute for 
Social Medicine in Hangzhou, agrees that the 
scandal is unlikely to reduce child immuniza- 
tion rates. 


PARENTS SURVEYED 
In 2016, when Chinese authorities discovered 
that childhood vaccines rendered ineffective 
by improper storage had been distributed to 
medical clinics across the country over five 
years, Wang and her colleagues went out and 
asked parents about their views on vaccines. 
They found that parents had very low faith 
in the safety of domestically produced vac- 
cines, which make up 95% of vaccines given 
in China — only 11% said they trusted them. 
But the researchers also found that more than 
half of parents surveyed still intended to rely 
on them to vaccinate their children (X. Wang 
et al. J. Health Commun. 23, 413-421; 2018). 
The disparity between parents’ trust in 
domestic vaccines and their willingness to use 
them is probably influenced by several factors, 
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China has one of the world’s highest rates of vaccination coverage. 


including cost and availability, says one of 
Wang's collaborators on the survey, Leesa Lin, 
who studies social behaviour and risk percep- 
tion at the Harvard T. H. Chan School of Public 
Health in Boston, Massachusetts. 

Although the government subsidizes many 
domestic-made vaccines, those who have 
access to and can afford foreign-made vaccines 
are likely to seek them out, says Lin. “When 
an incident like this happens, the public might 
seek safer alternatives but would not give up on 
vaccination,’ she says. 

She attributes this, in part, to a widespread 
understanding of the benefits of vaccines, 
after decades of government campaigns pro- 
moting immunization. China has one of the 
world’s highest vaccination coverage rates — 
for example, 99% of Chinese infants receive 
the required three doses of the DTP vaccine, 
compared with 85% of infants globally, and 
95% in the United States. Lin and the team are 
about to submit for publication results from 
another survey, which confirmed parents’ 
strongly-held faith in the benefits of vaccina- 
tion programmes. 

Those results stand in contrast to the atti- 
tudes of some small groups of parents in 
Europe and the United States who resist vac- 
cinating their children, citing unfounded 
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safety concerns or religious reasons, says Lin. 
“In China, those factors do not play much of 
arole” 

Lin and Wang plan to survey parents again 
in the coming months to improve understand- 
ing of the factors that influence their decisions. 

Wagner notes that the DTP vaccine is man- 
datory for children starting school in China and 
that such requirements contribute to the coun- 
try’s high vaccine uptake. Some parents might 
delay vaccination in the wake of the latest scare, 
but they are unlikely to risk their child being 
denied entrance to school, he says. He adds that 
it’s particularly hard to get an exemption from 
such requirements in China, compared with 
other countries that have similar rules. 


REPAIR TRUST 

Wagner hopes that Chinese officials will be 
more transparent about what happened at 
Changchun Changsheng than they were dur- 
ing the 2016 scandal, and that they will take 
stronger action to prevent another incident. At 
the time, the government promised to improve 
oversight of vaccine manufacturing and trans- 
portation, he says. “They talked big, but 'm 
not exactly sure what they did,” he says. “I hope 
that they'll learn some lessons from this event 
and implement tighter regulations.” 

There are already encouraging signs. Several 
Changchun Changsheng executives, includ- 
ing the chair, have been arrested, and Chinese 
President Xi Jinping has said that the events 
are shocking and has ordered an investigation 
of the vaccine production chain. “This is a step 
forward compared to last time,’ says Lin. 

China's vaccine makers will also have to 
convince international markets that their 
vaccines are safe and effective if the coun- 
try is to become a major global producer, 
says Wagner. “Vaccines and pharmaceutical 
products could be a huge industry for them.” m 
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In the EU, gene-edited crops and food will be treated in the same way as genetically modified organisms. 


GENE EDITING 


EU law deals blow 
to CRISPR crops 


Top court’s ruling threatens research on gene- edited plants. 


BY EWEN CALLAWAY 


ene-edited crops should be subject to 
(S"« same stringent regulations that 

govern conventional genetically modi- 
fied (GM) organisms, Europe’s highest court 
ruled on 25 July. 

The decision, handed down by the Court of 
Justice of the European Union (ECJ) in Lux- 
embourg, is a major setback for proponents of 
gene-edited crops, including many scientists. 
They had hoped that organisms created using 
relatively new, precise gene-editing technolo- 
gies such as CRISPR-Cas9 would be exempted 
from existing European law, which has limited 
the planting and sale of GM crops. 

Instead, the ECJ ruled that crops created 
using these technologies are subject to a 2001 
directive. That law was developed for older 
breeding techniques, and it puts high hurdles 
in the way of developing GM crops for food. 

“It is an important judgment, and it’s a 
very rigid judgment,” says Kai Purnhagen, a 
legal scholar at Wageningen University and 
Research in the Netherlands who specializes 
in European and international law. “It means 
for all the new inventions, such as CRISPR- 
Cas9 food, you would need to go through 
the lengthy approval process of the European 
Union.” 
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That is likely to hinder investment in crop 
research using these tools in the EU, says Purn- 
hagen. “From a practical perspective, I don't 
think this will be at all of interest for business. 
So they will move somewhere else,’ he says. 

The ruling is “tremendously disappoint- 
ing’, says Nigel Halford, a crop geneticist at 
Rothamsted Research in Harpenden, UK. “It’s 
a real hit to the head,’ he says. Gene-editing 
techniques will still be used as a research tool 
for developing crops, he adds, but he doubts 
that companies in Europe will have much 
appetite to develop them. “They are not going 
to invest in a technology they see not having 
any commercial application,’ Halford says. 

Environmental organization Friends of the 
Earth in Amsterdam, meanwhile, applauded 
the court’s decision ina statement. It also called 
for all products made through gene editing 
to be regulated, assessed for their health and 
environmental impacts, and labelled. 


DNA CHANGES 

The 2001 EU directive behind the EC)J’s 
decision concerns the intentional release of 
GM organisms into the environment — and 
was aimed at species into which entire genes, 
or long stretches of DNA, had been inserted. 
The law exempts organisms whose genomes 
were modified using ‘mutagenesis’ techniques, 


such as irradiation, which introduce changes 
to an organism’s DNA but dont add foreign 
genetic material. 

In 2016, the French government asked the 
ECJ to interpret the directive in light of plant- 
breeding techniques that have since emerged. 

Many plant breeders and scientists contend 
that gene-editing techniques such as CRISPR- 
Cas9 should be considered mutagenesis, just 
like irradiation, and thus be exempt from the 
directive, because they can involve changes to 
DNA and not the insertion of foreign genes. 
But people opposed to GM organisms contend 
that the deliberate nature of alterations made 
through gene editing means that they should 
fall under the directive. 

In January, an advocate-general with the 
court, Michal Bobek, issued a 15,000-word 
opinion that both sides claimed was partly in 
their favour. He said that gene-edited crops do 
constitute GM organisms under the original 
directive, but also that species modified using 
technologies discovered since 2001 — such 
as those used for gene editing — could be 
exempted, as long as they don’t contain DNA 
from other species, or artificial DNA. 

But in its ruling, the ECJ determined that 
only mutagenesis techniques that have “con- 
ventionally been used in a number of applica- 
tions and have a long safety record are exempt 
from those obligations”. Organisms made 
using mutagenesis techniques developed 
after 2001 — including gene editing — are not 
exempt from the directive. 


NO INCENTIVE 

“This will have a chilling effect on research, 
in the same way that GMO legislation has 
had a chilling effect for 15 years now,” says 
Stefan Jansson, a plant physiologist at Umea 
University in Sweden. Gene-edited crops will 
not vanish from European research labs, but 
he worries that the funding to develop them 
could dry up. “If we cannot produce things 
that society finds helpful, then they will be less 
likely to fund us.” 

Jansson also has practical concerns about 
the ruling. He developed a ‘CRISPR cabbage’ 
that he has consumed, and which was grow- 
ing in his home garden as he spoke to Nature. 
“I tooka photo yesterday, and I took another 
after the ruling. It’s still the same plant. Yes- 
terday it wasn't a GMO, and now it’s a GMO. 
I'm a bit curious what I have to do. Do Ihave 
to remove it?” 

Purnhagen says that the ruling leaves open 
a possible loophole, whereby if scientists can 
prove that gene-editing techniques are as 
safe as mutagenesis methods already exempt 
from the law, such as irradiation, the new 
techniques, too, could earn an exemption. 

But he doubts that researchers and busi- 
nesses developing gene-edited crops will hold 
out hope. “I can’t see CRISPR-Cas9 and all these 
new technologies will be profitable in the Euro- 
pean Union. I can’t see this happening. I think 
this research will move somewhere else.” m 
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ASTROPHYSICS 


Black hole provides 
test for Einstein’s theory 


General relativity seen in action at Milky Way’s centre. 


BY ALEXANDRA WITZE 


stronomers have caught the giant black 
At at the Galaxy’s centre stretching 

the light emitted by an orbiting star 
— nearly three decades after they first started 
tracking the star. The long-sought phenom- 
enon, known as gravitational redshift, was 
predicted by Albert Einstein’s general theory 
of relativity, but until now, it had never been 
detected in the environs ofa black hole. 

“It’s another big step in getting closer to 
understanding the black hole,” says Heino 
Falcke, an astronomer at Radboud University 
in Nijmegen, the Netherlands, who was not 
involved in the research. “This is just amaz- 
ing, to be able to see these effects.” 

A team led by astrophysicist Reinhard 


Genzel of the Max Planck Institute for Extra- 
terrestrial Physics in Garching, Germany, 
reported the discovery on 26 July in Astronomy 
& Astrophysics (R. Abuter et al. Astron. Astro- 
phys. 615, L15; 2018). 

Genzel and his colleagues have been tracking 
the journey of this star, known as S82, since the 
early 1990s. Using telescopes at the European 
Southern Observatory in Chile, the scientists 
watch as it travels in an elliptical orbit around 
the black hole, 8,000 parsecs (26,000 light years) 
from Earth. With a mass 4 million times that of 
the Sun, the black hole generates the strongest 
gravitational field in the Milky Way. That makes 
itan ideal place to hunt for relativistic effects. 

On 19 May, S2 passed as close as it ever does to 
the black hole. The researchers traced the star's 
path using instruments including GRAVITY, 
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an interferometer that combines light from four 
8-metre telescopes. GRAVITY measured S82’ 
movement across the sky; at its fastest, the star 
whizzed along at more than 7,600 kilometres 
per second, nearly 3% of the speed of light. A 
different instrument studied how fast $2 moved 
towards and away from Earth as the star swung 
past the black hole. Combining the observations 
let Genzel’s team detect S2’s gravitational red- 
shift — which describes how its light is stretched 
by the black hole’s immense gravitational pull. 
Such a phenomenon is consistent with the pre- 
dictions of general relativity. 

Future observations of $2 might confirm 
other predictions made by Einstein, such as 
how a spinning black hole drags space-time 
around with it. 

“Their data look beautiful” says Andrea 
Ghez, an astronomer at the University of 
California, Los Angeles, who leads a compet- 
ing team that uses the Keck telescopes in Hawaii 
to measure the star's path around the Galactic 
Centre. It takes 16 years for S2 to make a com- 
plete orbit around the black hole, so both groups 
have been eagerly awaiting this year’s close pas- 
sage. They are still watching the star closely as it 
slows to its minimum velocity in the line of sight 
from Earth — another crucial event. “We're in 
the thick of it? says Ghez. “It’s super-exciting?” m 


US wildlife law in danger 


But bills that could strip protections from vulnerable species face resistance from legislators. 


BY JEREMY REHM 


r | Vhe US Endangered Species Act — which 
protects more than 2,000 species of 
plants and animals, including insects, 

at risk of extinction — is under renewed attack 
from Republican politicians. But policy experts 
say that these efforts face an uphill battle, even 
though Republicans control the White House 
and both chambers of Congress. 

On 19 July, the US Fish and Wildlife Service 
(FWS) and the National Marine Fisheries Ser- 
vice (NMFS) proposed policy changes that 
would, among other things, make it easier to 
delist species and harder to add new ones. And 
in recent weeks, legislators in the US House of 
Representatives have gone further by introduc- 
ing about 12 bills aimed at altering the lawitself. 

Some of the bills would roll back protections 
for species including the American burying 
beetle (Nicrophorus americanus). Lawmakers 
say that this is to remove barriers to the activi- 
ties of businesses such as oil companies. Other 
bills propose fundamental changes to the law, 
for example by narrowing the range of habitats 
deemed necessary for organisms to recover. 


“The law needs to be updated to ensure it 
maintains its original intent and focus of spe- 
cies recovery, and not simply serve as a tool 
for endless litigation,” says Representative Rob 
Bishop, the Utah Republican who heads the 
House of Representatives Committee on Natu- 
ral Resources. The committee has spearheaded 
many of the bills 


under consideration. “It’s pretty 
Decisions on muchthe oil and 

whether or not to gas companies 

list a species under that have been 

the Endangered Spe- looking to 

cies Act (ESA) often have the beetle 

draw legal challenges delisted.” 


from industry and 
environmental organizations. Special-interest 
groups including the oil and gas industry have 
fought the law since its enactment in the 1970s. 
Many conservation scientists and environ- 
mental groups say the Republican legisla- 
tion would cripple the ESA by making it 
much harder to protect species that are now 
imperilled. One of the House bills would 
weaken safeguards for threatened species. 
Brett Hartl, government-affairs director 


for the Center for Biological Diversity in 
Washington DC, doubts that the bills will 
become law — even if the Republicans retain 
control of the US Senate and the House after 
the November midterm elections. Similar 
legislation that has been introduced over the 
past several years has foundered, he notes. 

Hartl worries more about the changes to 
the ESA proposed in mid-July by the FWS 
and the NMES. “Those are very dangerous,” 
he says, because they don’t require approval 
by Congress to take effect. The plans must 
undergo a 60-day public-comment period 
before President Donald Trump’s administra- 
tion can finalize the changes and implement 
them. Hartl thinks that they will probably 
make it through the process. 

But the plan is likely to face lawsuits, says 
Steve Holmer, vice-president of policy at the 
American Bird Conservancy in The Plains, 
Virginia. Litigants could challenge the propos- 
als on several grounds, using arguments that 
the changes would harm species — in direct 
opposition to the ESA’s original aim. 

One of the species that researchers worry 
about is the endangered American burying > 
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> beetle, which legislators have been trying to 
strip of federal protections since 2013. 

Habitat destruction in the twentieth century 
eliminated 90% of this insect’s historical range, 
which stretched across 35 states in the Midwest 
and the East Coast. Declining food sources 
also contributed to population declines. The 
US government added the beetle to the endan- 
gered-species list in 1989. 

The insect’s remaining habitat includes parts 


The endangered American burying beetle inhabits areas that are of interest to oil and gas companies. 


of Kansas, Oklahoma and Nebraska — states 
where gas- and oil-drilling companies hold 
interests, says Louis Perrotti, director of con- 
servation programmes at Roger Williams Park 
Zoo in Providence, Rhode Island. “It’s pretty 
much the gas and oil companies that have been 
looking to have the beetle delisted” 

The most recent legislative salvo included 
an addition to the House's 2019 funding bill 
for the Department of Defense that would 


have removed the insect from the endangered- 
species list. After an outcry from some of the 
lawmakers working to reconcile the House and 
Senate versions of the spending bill, legislators 
removed the addition early last week. 

Perrotti and his collaborators have been 
breeding burying beetles in captivity and 
releasing them into the wild to establish self- 
sustaining populations. The insects are impor- 
tant because they feed on carrion. “Without 
burying beetles, wed be knee-deep in dead and 
decaying carcasses,” says Perrotti. 

If lawmakers eventually succeed in delist- 
ing the insect, the repopulation project could 
lose major collaborators that receive federal 
funding. And there is a real risk of the beetle 
going extinct if legislators change the ESA, says 
Perrotti. Nevertheless, he and his collaborators 
will keep pushing to save the species. “A lot of 
people have put their blood and soul into this, 
including me? = 


CLARIFICATION 

The World View ‘Preprints could promote 
confusion and distortion’ (Nature 559, 445; 
2018) omitted to mention that funders of 
the Science Media Centre (the author’s 
employer) include Springer Nature, 
Nature’s publisher. 


JOEL SARTORE/GETTY 


nature briefing 


What matters in science and why - 
free in your inbox every weekday. 


The best from Nature’s journalists and other publications worldwide. 
Always balanced, never oversimplified, and crafted with the scientific 


community in mind. 


SIGN UP NOW 


go.nature.com/briefing 


A45829 


Climate 
as culprit 


Weather forecasters will soon 
provide instant assessments of 
global warming’s influence on 

heatwaves and floods. 


BY QUIRIN SCHIERMEIER 


under prolonged heat, with destructive wildfires in 
Greece and, unusually, the Arctic. And drought-fuelled 
wildfires are spreading in the western United States. 

For Friederike Otto, a climate modeller at the University of Oxford, 
UK, the past week has been a frenzy, as journalists clamoured for her 
views on climate change’s role in the summer heat. “It’s been mad,” 
she says. The usual scientific response is that severe heatwaves will 
become more frequent because of global warming. But Otto and her 
colleagues wanted to answer a more particular question: how had 
climate change influenced this specific heatwave? After three days’ 
work with computer models, they announced on 27 July that their 
preliminary analysis for northern Europe suggests that climate change 
made the heatwave more than twice as likely to occur in many places. 

Soon, journalists might be able to get this kind of quick-fire analysis 
routinely from weather agencies, rather than on an ad hoc basis from 
academics. With Otto’ help, Germany’s national weather agency is 
preparing to be the first in the world to offer rapid assessments of 
global warming’s connection to particular meteorological events. By 
2019 or 2020, the agency hopes to post its findings on social media 
almost instantly, with full public reports following one or two weeks 
after an event. “We want to quantify the influence of climate change 
on any atmospheric conditions that might bring extreme weather to 
Germany or central Europe,’ says Paul Becker, vice-president of the 
weather agency, which is based in Offenbach. “The science is ripe to 
start doing it”. 

The European Union is interested too. The European Centre for 
Medium-Range Weather Forecasts (ECMWEFE) in Reading, UK, is pre- 
paring to pilot a similar programme by 2020 that will seek to attribute 


he Northern Hemisphere is sweating through another 
unusually hot summer. Japan has declared its record 
temperatures a natural disaster. Europe is baking 
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extreme events, such as heatwaves or floods, to 
human-induced climate change. If that works 
well, a regular EU attribution service could be in 
place a year or two later, says Richard Dee, head 
of the EU’s Copernicus Climate Change Service 
at the ECMWE. “It's ambitious, but doable,” says 
Otto, who is also helping to set up the EU effort. 

That weather agencies are contemplating such regular services 
shows how far ‘attribution science’ has come since the first cutting-edge 
research projects — more than a decade ago — tried to attribute indi- 
vidual weather events to climate change’. Now, after more than 170 stud- 
ies in peer-reviewed journals, attribution science is poised to burst out 
of the lab and move into the everyday world’. It still has difficulty with 
some kinds of extreme weather phenomena, but as meteorological 
services begin to offer attribution information routinely, the bigger 
challenge is to work out how to make the studies helpful to the people 
who might use them. “It’s one thing to make scientifically robust attribu- 
tion statements,’ says Peter Walton, a social scientist at the University 
of Oxford. “How to go about using that information is another thing” 


Global warming tripled 
the odds of Russia’s 
extreme heatwave 

in 2010, which 
exacerbated wildfires. 


ATTRIBUTION 101 

The idea behind attribution science is simple enough. Disasters such 
as record-breaking heatwaves and extreme rainfall are likely to become 
more common because the build-up of greenhouse gases is altering 
the atmosphere. Warmer air contains more water vapour and stores 
more energy; the increasing temperatures can also change large-scale 
atmospheric circulation patterns. But extreme weather can also arise 
from natural cycles, such as the El Nifio phenomenon that periodically 
warms sea surface temperatures in the tropical Pacific Ocean. 

Researchers say that teasing out the role of human-induced global 
warming — as opposed to natural fluctuations — in individual weather 
extremes will help city planners, engineers and home-owners to under- 
stand which kinds of floods, droughts and other weather calamities are 
increasing in risk. And surveys suggest that people are more likely to sup- 
port policies focused on adapting to climate-change impacts when they 
have just experienced extreme weather, so quickly verifying a connection 
between a regional event and climate change, 
or ruling it out, could be particularly effective’. 

Otto, the deputy director of the University 
of Oxford’s Environmental Change Institute, 
is a veteran of attribution science, having con- 
ducted more than two dozen analyses. On 
4 June, for instance, she and her colleagues 
completed a study focused on the southern 
edge of Africa, which had been suffering from a three-year drought. 
By early this year, the situation had become so dire in South Africa’s 
Western Cape Province that officials in Cape Town had warned they 
would soon hit ‘Day Zero, when the region would run out of water to 
serve basic needs — a first for a major city. 

As reports of Day Zero made international headlines, Otto and Mark 
New, a climate scientist at the University of Cape Town, decided that 
the event was a good candidate for an attribution study. Working in 
their spare time because they had no dedicated funding for the pro- 
ject, researchers from the Netherlands, South Africa, the United States 
and the United Kingdom started by defining the regional extent of the 
multi-year drought. They also created an index of its severity, which 
combined measurements of rainfall and heat. Then, the teams turned 
to the workhorses of attribution studies: complex computer models that 
mimic Earth’s climate. On each of five independent models, they ran 
thousands of simulations. Some of these took into account observed 
levels of human-generated greenhouse gases; others ran with natural 
concentrations of the gases, as if the Industrial Revolution had never 
happened. The researchers compared how many times a drought of 
similar severity and extent turned up in the thousands of test runs. Most 
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of the teams used their own dedicated computers, but the Oxford branch 
of the study conducted its simulations on the weather@home model 
ensemble, a distributed computing framework that uses the idle time 
of thousands of volunteers’ personal computers. 

By the time the team met in June, rains had returned to South Africa 
and had pushed Day Zero away. But the scientists were still chasing the 
causes of the mega-drought, which could help to determine whether the 
region might face a repeat anytime soon. Coordinating a four-way Skype 
call from her office in Oxford, Otto looked relieved when colleagues 
agreed that the analysis had yielded a result. “Global warming has tripled 
the odds of three consecutive dry years in the region,” she says. 

The findings came just in time for Roop Singh, a climate-risk adviser 
at the Red Cross Red Crescent Climate Centre in The Hague, the 
Netherlands, to present the results at a conference on climate-change 
adaptation in Cape Town two weeks later. Researchers there didnt find 
the results particularly shocking, Singh says — but they did trigger lively 
discussions about whether the increase in drought risk could help to 
justify increased investment in diversifying water sources in Cape Town. 
Otto's study was published on 13 July, before peer review, at the website 
of World Weather Attribution (see go.nature.com/2tyjezc), a partner- 
ship of six research institutes (including the University of Oxford) that 
joined together in 2014 to analyse and communicate the possible effect 
of climate change on extreme weather events. 

Although Cape Town avoided Day Zero this year, policymakers in the 
region say Ottos results send a sobering warning to water authorities that 
might be inclined to downplay the risk of global warming. “This is an 
incredibly strong message which we cannot afford to ignore,’ says Helen 
Davies, director of green economy in the Western Cape Government's 
Department of Economic Development and Tourism. “We may need 
to work ona radically new approach to water management,’ she says. 

The work by Otto's team joins a rapidly growing corpus of studies 
on climate attribution. From 2004 to mid-2018, scientists published 
more than 170 reports covering 190 extreme weather events around 
the world, according to an analysis by Nature, which builds on previ- 
ous work by the publication CarbonBrief (see go.nature.com/2jypsyc). 
So far, the findings suggest that around two-thirds of extreme weather 


“If we want science to be part of the 


discussion, we need to say something fast.” 


events studied were made more likely, or more severe, by human- 
induced climate change (see ‘Attribution science’). Heat extremes 
made up more than 43% of these kinds of events, followed by droughts 
(18%) and extreme rain or flooding (17%). In 2017, for the first time, 
studies even stated that three extreme events would not have occurred 
without climate change: heatwaves in Asia* in 2016, global record heat 
in the same year’, and marine warming in the Gulf of Alaska and the 
Bering Sea® from 2014-16. But in 29% of cases in Nature's analysis, the 
available evidence either showed no clear human influence or was too 
inconclusive for scientists to make any judgement. 

Sometimes studies seem to come to opposite conclusions about a par- 
ticular event. One study about a 2010 heatwave in Russia found that 
its severity was still within the bounds of natural variability’; another 
analysis determined that climate change had made the event more likely 
to occur®. The media found the results confusing, but climate scientists 
say the discrepancy is not surprising because the two studies looked at 
different issues: severity and frequency’. According to Otto, “The exam- 
ple goes to show that framing and communicating attribution questions 
is a real challenge.” But researchers have become more sophisticated 
since then about how they set up and present their studies, she adds. 
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Attribution science 
Researchers have published more than 170 studies® examining the role 
of human-induced climate change in 190 extreme weather events. 
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“Studies from 2004-18 collated by Nature and CarbonBrief. tHeat includes heatwaves and wildfires; 
Oceans includes studies on marine heat, coral bleaching and marine-ecosystem disruption. 


RAPID REPORTS 

The South Africa study could have been done faster, had the researchers 
been able to spend all their time on it. This year’s work during the 
European heatwave was not the first rapid study: in 2015, for instance, 
during another sweltering heatwave in Europe, an international team of 
researchers (including Otto) found within weeks how climate change 
had made comparable heatwaves four times more likely in some Euro- 
pean cities, and at least twice as likely over much of the continent (see 
go.nature.com/2uzpoxc). Meteorological agencies plan to work even 
faster when they put these experimental methods into regular opera- 
tion. Over the past few months, Otto has talked extensively with the 
staff of the German weather service, briefing them on how to conduct 
attribution studies using the best approaches. On 21 June, she signed an 
agreement with the agency that provides free use of the University of 
Oxford’s weather@home model. Meanwhile, the Copernicus Climate 
Change Service has asked Otto and two of her colleagues to write a paper 
describing workflows and methods for conducting rapid attribution 
studies, to be published by September. 

Otto says a rapid attribution service is needed because questions 
about the role of climate change are regularly asked in the immediate 
aftermath of extreme weather events. “If we scientists don't say any- 
thing, other people will answer that question not based on scientific 
evidence, but on whatever their agenda is. So if we want science to be 
part of the discussion that is happening, we need to say something 
fast; she says. 

Some scientists might feel uncomfortable if weather forecasters 
announce results before work has gone through peer review. But in these 
cases, the methods have already been extensively reviewed, says Gabriele 
Hegerl, a climate scientist at the University of Edinburgh, UK. Hegerl 
is also a co-author of a 2016 report by the US National Academies, 
which concluded that the science of attribution has advanced rapidly 
and would benefit from being linked to operational weather prediction. 
“Tt can be really useful to have results quickly available for event types 
we understand reasonably well, such as heatwaves,” she says. “You don't 
need to peer review the weather forecast; adds Otto. 

But not all of the science involved in attribution studies is settled, 
Hegerl says. Computer algorithms still struggle to model severe local 
storms that result from the rapid convection of air, such as small hail- 
storms and tornadoes, so scientists can’t say whether climate change 
has made these events more likely. Reliable attribution is also difficult 
or even impossible where long-term climate records are still lacking, 
such as in some African countries. And there might still be natural 
climate variability that is not fully visible in the relatively short record 
of direct climate observations. To trace very-long-term climate fluc- 
tuations — such as those caused by changes in atmospheric-pressure 
patterns or sea surface temperatures that cycle once every few dec- 
ades — researchers must rely on low-resolution proxy data, such as 
from tree rings. That this variability doesn’t always show up in direct 
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observations does create some uncertainty in studies, particularly for 
research on drought attribution, says Erich Fischer, a climate scientist 
at the Swiss Federal Institute of Technology in Zurich. 

At a meeting in Oxford in 2012, some critics questioned whether 
climate scientists could be confident about the conclusions of attribu- 
tion studies, given the lack of observational data and weaknesses in the 
climate models of the time. But since then, doubts have largely been 
quelled. Researchers now run the studies using several independent 
climate models, which reduces uncertainty because they can look for 
results that concur. And scientists are more careful about how they make 
probabilistic claims. “Extreme-event attribution has made a lot of pro- 
gress since it began with scant resources,’ says Fischer. “It may still not 
work for small hailstorms or tornadoes. But attribution claims are now 
fairly robust for any large-scale weather patterns that can be represented 
by state-of-the-art climate models.” 


UNCLEAR IMPACT 

In South Africa, Davies says Otto's latest study should help to press the 
case for new approaches to regional water management. “Meteorolo- 
gists assured us after the second year of drought that there was no way 
we were going to havea third dry year in a row. But we cant use the past 
any more for what might happen in the future. We need to learn to adapt 
to a changing climate, and we absolutely need attribution to do it right” 
One of the lessons of the recent drought and the attribution analysis is 
that the Western Cape should not rely solely on rainfall to replenish its 
water supply, she says. Instead, it should diversify by tapping groundwa- 
ter and expanding its desalination and waste-water treatment facilities. 

But, in general, it’s hard to know what effect attribution studies are 
having, social scientists say. That's because it is difficult to tease out 
the impacts of these findings from other studies that forecast increased 
risks of extreme weather associated with climate change — or from the 
shock of the weather events themselves. Still, if attribution studies start 
appearing regularly in weather reports, rather than just in scientific 
journals, then their impacts could become much more conspicuous, 
says Jorn Birkmann, an expert in spatial and regional planning at the 
University of Stuttgart in Germany. “City and infrastructure planners 
who plan and approve new housing areas, hospitals or train stations 
need to consider risks of extreme weather events more precisely if these 
events are clearly attributed to climate change,’ he says. 

Evidence from attribution reports could also feed into litigation on 
climate change, suggest Birkmann and James Thornton, the London- 
based chief executive of ClientEarth, an international group of envi- 
ronmental lawyers. Court cases that allege failure to prepare for the 
effects of climate change haven't yet cited attribution studies, Thornton 
says. But he thinks judges will increasingly rely on them to help decide 
whether defendants — who might be oil companies, architects or gov- 
ernment agencies — can be held liable. “Courts tend to give credibility to 
government data,’ he says. “If attribution moves from science to public 
service, judges will be much more comfortable using the results.” 

At the German weather agency, Becker says he is convinced that 
attribution studies will become a valuable service for many parts of 
society. “It’s part of our mission to illuminate the links between climate 
and weather,’ he says. “There is demand for that information, there is 
science to provide it, and we are happy to spread it” = 


Quirin Schiermeier is a senior reporter for Nature based in Munich, 
Germany. 
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Late-night gridlock on Beijing’s roads. 


Set road charges in 
real time to ease traffic 


Track vehicles to link tolls with demand and cut congestion, 
urge Peter Cramton, R. Richard Geddes and Axel Ockenfels. 


raffic jams cost us time, money and 
Tira In 2016, the average US driver 

spent 42 hours in congestion during 
peak hours, and those in Los Angeles, Cali- 
fornia, spent 104 hours!. New Yorkers can 
walk almost as fast as vehicles crawl along 
streets in central Manhattan’. Being stuck 
is frustrating and stressful. Jammed cars can 
burn up to 80% more fuel than those in free 
traffic *. This leads to more air pollution and 
greater carbon dioxide emissions, increases 
the incidence of heart attacks, strokes and 
asthma and contributes to poor infant 


health®”, especially among city dwellers. 

The economic damage of congestion last 
year in the United States, Germany and Brit- 
ain totalled US$461 billion®. Such costs are 
rising as the world’s population grows and 
urbanizes. The six most congested countries 
in 2017 were Thailand, Indonesia, Colombia 
and Venezuela, with Russia and the United 
States tied in fifth place. 

The usual response is to call for more 
roads. But they don't diminish traffic’. Quite 
the opposite: more drivers move in. Nor will 
artificial-intelligence systems, ride-hailing 
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services and autonomous cars ease the grid- 
lock. Navigation systems draw more drivers 
to certain routes and can spread congestion to 
formerly quiet streets. The ride-hailing apps 
Uber and Lyft, for instance, have increased 
traffic because people make more car jour- 
neys. Without ride-hailing, according to one 
US survey, around half of such trips between 
2014 and 2016 would have either not been 
made, or would have been done on foot, 
bicycle or public transport’®. Similarly, self- 
driving cars use roads and fuel efficiently (and 
reduce accidents), but those gains might be 
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Singapore has experimented with electronic pricing on roads in its inner zones. 


> swamped by an increased desire for cheap 
and easy transport! : 

The answer lies in dynamic road pricing. 
The location of individual cars can now be 
tracked to within a few centimetres. This 
makes it feasible to measure and price road 
use in real time according to demand. If the 
price were set at the right level, enough car 
drivers would choose to drive at a different 
time or take a different route or mode of 
transport to cut congestion. Limited road 
space would be managed in a similar way 
to airfares, electricity, hotel rooms and train 
journeys. Uber already balances demand and 
supply of its cars through surge pricing. 

Overall, dynamic pricing does not drive 
motorists away. It can double the capacity of 
a congested route in peak times by preventing 
traffic jams’” — just as managing fisheries can 
ease overfishing. Pollution and stress would 
decrease. The funds raised could be used to 
improve roads and public transport, and 
to reduce fuel and other taxes. 

Fixed pricing schemes have been tried. 
Since 2015, 5,000 volunteers in Oregon 
have been trialling a tax on miles travelled 
by car. Around a dozen countries, includ- 
ing Germany, Austria and Switzerland, 
follow a similar approach for lorries. In 
recent decades, some cities in the United 
States and elsewhere, including Singapore 
and Stockholm, have experimented with 
electronic charging for roads in their inner 
zones. 

But such schemes do little for congestion, 


24 | NATURE | VOL 560 | 2 AUGUST 2018 


because prices often do not change meaning- 
fully with supply and demand”. A low price 
does little to mitigate jams at peak times. A 
price that is fixed high to eliminate peak con- 
gestion would be as inefficient and unaccep- 
table as having Thanksgiving airfares all year. 

Instead, policymakers and city managers 
need to track cars’ positions and adjust 
charges continuously, depending on how busy 
the roads are. 

What has stopped them? There are three 
research gaps: technology needs refining to 
measure road use by cars accurately and at 
low cost; an equity and privacy framework 
needs hammering out; and the implementa- 
tion of network-wide, real-time road pricing 
needs better economic and computational 
modelling. 


FREE RIDE 
Congestion is pervasive because motorists 
take no account of the cost that they impose 
on others. Prices should instead reflect 
motorists’ impact on each other. The sys- 
tem would operate in a similar way to an 
electricity market, making road space a 
commodity that can be bought and sold. An 
independent operator for the system would 
determine prices on each road segment to 
balance supply and demand, and thereby 
maximize the network's value to users while 
keeping traffic flowing”. 

Prices would be levied on all roads ina 
region. Charges could vary with time and 
place every ten minutes, say, according to 
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traffic conditions. Prices would thus respond 
to lane closures, weather and sporting events, 
as well as to peak commuting times. 

Research is needed to estimate the best 
levels for market prices. Most of the time, 
they would be near zero. On popular routes 
in Europe or the United States, an urban 
commute might cost as much as $20 — but 
the trip would take 30 minutes rather than, 
say, 45-90 minutes. The actual price would 
depend on how easily drivers can shift away 
to other times or modes of transport. 

Fees would be tailored to vehicle types. 
Lorries would pay more. An autonomous 
vehicle, driven using algorithms to promote 
free flow and possibly coupled with other cars 
to decrease spacing, uses less road capacity 
than a standard car and would pay less. 

An advantage of dynamic pricing is that 
it includes the means with which to charge 
the full social cost of a vehicle’s use — both 
congestion and pollution. Prices could be 
varied to keep air-quality measures, such as 
particulate matter, within limits. Although 
this would not affect the number of cars at 
peak times, it would increase the road price 
for dirty vehicles and make room for clean 
ones. Electric cars, for instance, would pay 
less. This would be a cheaper and less-intru- 
sive way of fighting pollution than banning 
diesel cars from city centres, as is being dis- 
cussed and implemented in Germany. 

Prices would be tracked using naviga- 
tion apps such as Google Maps and Waze. 
Such tools would present both real-time 
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information and forecasts of future prices, 
much as they do today for trip duration. 
Prices would be integrated into fares for taxis 
and ride-hailing services. 

As in other markets, changes in pricing 
should be smooth to avoid price shocks. And 
consumers need time to react. Advance pur- 
chases, such as the ability to buy a pass for the 
daily commute, would let consumers plan and 
avoid the risk of expensive real-time prices. 
The operator for such purchases could offer 
packages as part ofa road-use plan. 

Not everyone will need to respond to 
prices. Dynamic pricing would still reduce 
congestion even if most drivers do not react. 
Yet more motorists would adapt as they 
became familiar with the system. Some 
would commute earlier or later; others would 
cycle or take the train or bus. In the long run, 
other adjustments, such as moving house, 
more work flexibility, investments in public 
transport and smarter mobility technologies, 
are possible. People already make similar 
trade-offs — the price of current road use is 
the length of delay. 

There is much to learn about how best 
to manage a road network and how to 
target investments. Aggregate data about 
pricing and transport choices will need to be 
made publicly available. This would allow 
researchers and innovators to glean insights 
about the effectiveness of different measures 
and to develop apps that help motorists. 


CONCERNS 

Equity. Road-pricing schemes are often 
criticized as perpetuating inequality. 
People with lower incomes might be less 
able to afford to drive at popular times of 
day than those in wealthier groups. More 
research on such consequences is needed. 
But the problem could be smaller than 
feared. Pricing can make everybody better 
off — even before the revenue from conges- 
tion pricing is redistributed. For instance, 
suppose the left lanes of a busy multi-lane 
highway are priced at peak times. Because 
this increases throughput on the left lanes, 
there are fewer motorists on the right lanes, 
so everyone wins. 

Even a worker with less money, who abso- 
lutely must be at work at 8 a.m. and who has 
no access to public transport or other travel 
options, can be better off — even if all lanes 
on all roads are priced. For example, suppose 
the free-flow travel time is 30 minutes, the 
expectation is 60 minutes and the maximum 
is 90 minutes. Today, without road pricing, 
the worker must depart at 6:30 a.m. to get 
there on time. With efficient pricing, if the 
worker cannot afford the price to depart at 
7:30 a.m., she or he can continue to depart at 
6:30 a.m. at zero cost, say, but his or her travel 
time will be halved and so fuel costs and pol- 
lution will be reduced. Moreover, the revenue 
from congestion pricing could be given back 
to motorists, for example, through a lower 


road tax and less fuel duty. It could also be 
invested to improve public transport. 

The current situation, by contrast, is unfair. 
The free use of roads is equivalent to govern- 
ments subsidizing people who impose the 
biggest congestion and pollution costs on 
society. Roads are an essential service. The 
norm for other essential services such as elec- 
tricity, gas, water and communications is for 
consumers to pay for what they use. The fact 
that road use has not been priced is a fluke of 
history — until recently, technology did not 
allow for the measurement and communica- 
tion of pricing. 


Scepticism. Voters and politicians under- 
estimate the benefits of pricing. Public 
support builds once people experience such 
schemes. For instance, before Stockholm 
introduced a €2 (US$2.3) charge during 
peak hours for vehicles entering the inner 

city, two-thirds 


“The fact that of residents were 
roadusehasnot against the plan. 
been priced is a About 2 years later, 
fluke of history.” when the policy 

had reduced traf- 


fic by 20%, two-thirds of people were in 
favour of it'*. After a similar scheme was 
trialled in Milan, Italy, 80% of people voted 
to extend charges to more roads and vehicle 
types. More research is needed on how to 
communicate the impact of road pricing. 


Privacy. Road pricing raises concerns about 
personal data. Monitoring and enforcement 
require that the system operator know the 
location of each vehicle. Technically, this is 
easy and cheap: each vehicle would have a 
Global Positioning System (GPS) device. 
Strict privacy rules governing the operator 
imposing the charges would ensure that no 
individual data would be shared with others, 
as is standard in telecommunications and 
related industries. Modern cryptography 
makes it possible for the operator to run the 
market without any human having access 
to the data, yet prove that the rules are 
followed faithfully. Research is needed on 
whether such advanced systems give motor- 
ists comfort, although their embrace of apps 
for navigation and ride-hailing suggests 
that users are happy to accept some loss of 
privacy for improved services. 


MAKE IT HAPPEN 
The first step is to get devices in vehicles to 
measure road use. Singapore plans to install 
tracking and payment technology in all cars 
from 2020. Oregon's volunteers have GPS 
equipment on board so the state can levy a fee 
per mile. These are excellent starting points. 
Governments and city authorities should 
establish independent operators to introduce 
and adjust prices for times and locations. 
Researchers should study the impacts on 
motorists’ behaviour, traffic conditions and 
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pollution, the distribution of benefits and 
costs and public responses. This will involve 
studies by behavioural scientists of individual 
perceptions and changes in driving behav- 
iour; by transport engineers, economists 
and computer scientists of aggregate traffic 
flows through the network; and by market 
designers, to evaluate the effectiveness of the 
underlying incentive mechanisms. 

Service providers, such as Google, Apple, 
Uber and Lyft, are likely to seize the oppor- 
tunity to develop innovative tools that ena- 
ble consumers to make informed decisions. 
These tools would integrate past driving 
behaviour and relevant price information. 

Let's get a move on. Dynamic pricing is the 
only way forward for roads. m 
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Carl Woese discovered the ‘third domain’ of life — the Archaea. 


Scaling life’s tree 


John Archibald praises a compelling guide to 
3 billion years of life — and its molecular historians. 


writer David Quammen tells perhaps 
the grandest tale in biology: how scien- 
tists used gene sequencing to elucidate the 
evolutionary relationships between living 
beings. Charles Darwin called it the ‘great 
Tree of Life. But as Quammen reveals, at 
the molecular level, life's history is more 
accurately depicted as a network, a tangled 
web through which organisms have been 
exchanging genes for more than 3 billion 
years. This perspective is indeed radical, and 
he presents the science — and the scientists 
involved — with patience, candour and flair. 
Centre stage in Quammen’s narrative is 
Carl Woese (1928-2012), the US micro- 
biologist best known as the discoverer of 
the Archaea (Archaebacteria) — the ‘third 
domain’ of life. Inspired by the visionary 
musings of Francis Crick, Linus Pauling and 


|: The Tangled Tree, celebrated science 
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Emile Zuckerkandl, 
Woese committed 
himself to molecular 


ae way 


= 4 phylogenetics at a time 
( when this powerful 
ane approach to the study 
of evolution was in its 


infancy. During the 
1960s and 1970s, the 
Woese Laboratory 
at the University of 
Illinois at Urbana- 
Champaign devel- 
oped and refined 
techniques for deriv- 
ing sequence infor- 
mation from molecules of ribosomal RNA 
(core components of the cell’s protein-syn- 
thesizing factory, the ribosome). Sequences 
were painstakingly obtained from diverse 


; 
as 


Simon & Schuster 
(2018) 
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microbes and used as molecular yardsticks to 
infer how the organisms were related to one 
another and to animals and plants. Through 
the following two decades, as molecular 
sequencing got easier and cheaper, Woese’s 
‘three-domains’ tree — comprising archaea, 
bacteria and the nucleus-containing eukary- 
otes — served as the definitive road map for 
the field of comparative genomics. In many 
ways, it still does. 

But life is complicated, and so are the 
scientists who study it. In his breezy, con- 
versational style, Quammen shepherds us 
up and down life’s vast timeline, and across 
150-plus years of exciting, often controver- 
sial discoveries. He handles the complexities 
with humour and clarity (he’s right: some 
ribosomes do look like rubber ducks). We 
learn about the seeds of “tree thinking” in 
biology, before and after Darwin's 1859 
On the Origin of Species. We learn of a time 
when a natural classification of micro- 
organisms was considered impossible (they 
were deemed morphologically too simple, 
physiologically too variable). We learn how 
molecular sequencing helped test and even- 
tually prove the endosymbiont hypothesis 
for the origin of mitochondria and chloro- 
plasts; these eukaryotic organelles are now 
known to have evolved from once free- 
living bacteria. 

And we learn that although molecular 
phylogenetics provided the means with 
which to build a universal tree of life that 
includes microbes, it also provided the 
data that ultimately led us to question the 
precise nature of the tree. From the late 
1990s onwards, with dozens and eventually 
thousands of complete genome sequences 
in hand, biologists began to realize that the 
horizontal exchange of genes between dis- 
tantly related organisms is an important evo- 
lutionary force. (Quammen also reminds us 
that, as early as 1963, medical microbiologist 
Tsutomu Watanabe and colleagues provided 
evidence for horizontal gene transfer as a 
mediator of antibiotic resistance in bacteria.) 
Because genes have “moved sideways’, not 
all genes in a given genome share the same 
history. Current evidence suggests that this 
is also true for at least some macroorganisms 
(such as plants). The tree of life is tangled, 
some branches hopelessly so. 

At times, this master storyteller’s book 
reads like a travelogue. It brims with 
revelations from dozens of interviews 
with key players in their native habitats: 
the late Lynn Margulis, US champion of 
endosymbiotic theory; former Woese Lab 
members George Fox, Mitchell Sogin and 
Linda Bonen; and the environmental- 
DNA-sequencing legend and three-domains 
defender Norman Pace. Here too are “the 
four horsemen” of the gene-transfer apoca- 
lypse: William Martin, Jeffrey Lawrence, 
Peter Gogarten and Ford Doolittle. 

Some of the stories are laugh-out-loud 
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funny. In one, Woese’s collaborator Charles 
Vossbrinck — an “openhearted bear of a 
man” — picked up a tipsy, pontificating 
Woese at a barbecue and threw him into the 
bushes. (Their friendship survived.) Other 
tales are shockingly intimate. Woese's last 
months and weeks with pancreatic cancer, 
as revealed by those closest to him, make for 
painful, albeit illuminating reading. I was 
surprised, for instance, to learn that Woese 
believed in a deity. 

The Tangled Tree traces the full arc of 
Woese’s life and career. We see the fiercely 
determined young scientist struggling to 
collect the data that he intuited would be 
important, and the brooding, combative 
mid-career professor fighting to have his 

beloved archaea 
“Life is and three-domains 
complicated, tree accepted by 
andsoarethe _ the scientific com- 
scientistswho munity. Finally, 
study it.” there is the jaded, 

curmudgeonly 
legend wracked by a Darwin complex. None 
of the accolades showered on Woese seemed 
to matter (he and many others clearly felt he 
deserved a Nobel prize, but he never got 
one). Around 2010, Woese and Canadian 
science historian Jan Sapp began to collabo- 
rate on a book tentatively entitled Beyond 
God and Darwin. The project never moved 
beyond Sapp’s draft introduction, on which 
Woese wrote: “Jan, you accord Darwin 
so much more substance than the bastard 
deserves.” 

Above all, Quammen reminds us that 
science is an imperfect, highly social activ- 
ity. It happens in labs — but also in hallways 
and airports, over pizza or coffee. And as 
with any other human endeavour, egos and 
reputations play a huge part. Friendships are 
forged, broken and mended over perceived 
or actual slights in the literature or at confer- 
ences. The actual data matter less often than 
we would like to admit. 

To what extent is the tree metaphor still 
‘useful’? On this thorny question, Quammen 
is clear: among practising scientists, opin- 
ions differ greatly. Horizontal gene transfer 
is here to stay — it’s now a question of how, 
how much, how important and between 
which organisms. And it is here that our 
twenty-first-century science connects back 
to the centuries-old struggle to classify and 
make sense of the world around us. At root, 
science and philosophy are interwoven in 
ways that many of us fail to realize, a fact to 
which Quammen is wisely alert. m 
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Department of Biochemistry & Molecular 
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Canada, and the author of One Plus One 
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Books in brief 


MEL Chtnagy 


The Scientific Journal 

Alex Csiszar UNIV. CHICAGO PRESS (2018) 

Journals form the canon of scientific knowledge. But how, asks 
historian Alex Csiszar, did it come to bear “so much epistemic 
weight”? Focusing mainly on France and Britain during the turbulent 
nineteenth century, he unpicks the knotted roots of journals from 
the Royal Society’s Philosophical Transactions to Annales des sciences 
naturelles, and touches on the role of luminaries such as Nature’s 
first editor, Norman Lockyer. Amid fresh convulsions in scholarly 
publishing, much here resonates — not least, how commercial 
interests have shaped science communication almost from the start. 


Getting to Zero 

Sinéad Walsh and Oliver Johnson ZED (2018) 

In 2014, as West Africa’s Ebola crisis exploded, 28-year-old 
physician Oliver Johnson was co-running the isolation unit in Sierra 
Leone’s main hospital; Sinéad Walsh was Irish ambassador to the 
country and head of Irish Aid. Their in-depth memoir enshrines 
distinct perspectives on the front line of a fraught epidemic, to offer 

a nuanced analysis: we see both the Herculean efforts on the ground, 
and the humanitarian response, warts and all (see also P. Piot Nature 
537, 484-485; 2016). Among the lessons learnt, the need to respect 
local ‘citizen medics’ and collaborate with governments is pure gold. 


Reader, Come Home 

Maryanne Wolf HARPER (2018) 

This rich study by cognitive scientist Maryanne Wolf tackles an 
urgent question: how do digital devices affect the reading brain? 
Wolf explores the “cognitive strata below the surface of words”, the 
demotivation of children saturated in on-screen stimulation, and the 
power of ‘deep reading’ and challenging texts in building nous and 
ethical responses such as empathy. She advocates “biliteracy” — 
teaching children first to read physical books (reinforcing the brain’s 
reading circuit through concrete experience), then to code and use 
screens effectively. An antidote for today’s critical-thinking deficit. 


Fly Girls 

Keith O’Brien HOUGHTON MIFFLIN HARCOURT (2018) 

Shredded wings, broken propellers, stalled engines: in the 1920s, 
aviation was insanely risky. Undeterred, a select cadre of women 
embraced US aeroplane racing. In this engrossing mix of group 
biography and technology history, Keith O’Brien follows the lives of 
five: Amelia Earhart, Florence Klingensmith, Ruth Elder, Ruth Nichols 
and Louise Thaden. Earhart became a celebrity before disappearing 
over the Pacific Ocean; others found their prowess no match for 
sexism. The brilliant record-breaker Nichols, for instance, never flew 
professionally after the Second World War, and killed herself in 1960. 


Early Rock Art of the American West 

Ekkehart Malotki and Ellen Dissanayake UNIV. WASHINGTON PRESS (2018) 
The ancient geometric petroglyphs and pictographs of the 
American West — pecked into or painted on boulders and canyon 
walls — are beautiful enigmas. In this fascinating volume, linguist 
Ekkehart Malotki and scholar Ellen Dissanayake parse images 
created up to 15,000 years ago by Palaeoamericans from Arizona 
to Idaho, speculating about their origins and functions. Alongside 
Malotki’s stunning photographs of some 200 examples, the 
authors recontextualize the relics as products of ritualistic activity 
(artification’) rather than symbolic artworks. Barbara Kiser 
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Part of the ring tunnel at the Large Hadron Collider near Geneva, Switzerland. 


A rational reductionist 
argues his case 


Robert Crease enjoys an eminent physicist’s turn as the 
ideal witness to science and its history. 


(1977), physicist and Nobel laureate 
Steven Weinberg pictured his ideal reader 
as “a smart old attorney” who might not know 
much science, but “expects nonetheless to 
hear some convincing arguments”. That’ still 
his approach in Third Thoughts. This essay 
collection (his third for a lay readership, hence 
the title), ranges widely over science, the his- 
tory of science and current affairs, taking on 
everything from dark energy and quantum 
field theory to socio-economic inequity and 
the wastefulness of crewed space missions. 
The volume mixes new pieces with others 
previously published in The New York Review 
of Books. Weinberg acknowledges the help he 
received from the review’s formidable editor, 
the late Robert Silvers. He clearly apprenticed 
well: if his ideal reader is an astute lawyer, 
Weinberg might be described as an ideal wit- 
ness. He is clear, to the point, frank and trans- 
parent about his perspective — “rationalist, 
realist, reductionist, and devoutly secular”. 
Among half a dozen pieces on particle 
physics are lucid explications of, for instance, 
the Higgs boson, Hilbert space and the Large 
Hadron Collider. Weinberg has a knack for 
capturing a complex concept in a succinct, 
unforgettable image. He compares quantum 
superposition, in which a particle has two 
states at once, to a musical chord; when meas- 
urement collapses the particle to one state, it 


I his 1974 book The First Three Minutes 
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Third Thoughts “somehow shifts all the 
STEVEN WEINBERG intensity of the chord 
Belknap (2018) to one of the notes, 


which we then hear 
on its own” And he describes the discovery 
that nature obeys symmetries whose conse- 
quences can be worked out as “like having a 
spy in the enemy's high command”. 

A few of the essays delve into history. 
Weinberg has irked professionals in the field 
by venturing into their territory, but not 
because he gets his facts wrong. As a self- 
confessed ‘Whig’ historian, he believes that 
the past should be judged by values of the 
present (the term springs from the name of 
the long-defunct British political party whose 
members thought history had been building 
towards Parliamentary government). Wein- 
berg offers a curious panorama that demotes 
certain canonical figures, such as Democritus, 
Francis Bacon and René Descartes. 

The ancient Greek philosopher Demo- 
critus didn’t make observations, so, although 
he correctly proposed that matter is made 
of atoms, Weinberg argues that he “was 
wrong about how to learn about the world”. 
Yet Democritus is of supreme importance 
for the understanding of science history (as 
Weinberg admits) because he inspired atomic 
theorists of the early modern period, such as 
Robert Boyle and other corpuscular phi- 
losophers, in their efforts to explain nature 
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without using theology or teleology. Similarly, 
Bacon and Descartes, although sometimes 
scientifically misguided by today’s stand- 
ards, were vastly important for ushering in the 
mechanical thinking that Weinberg himself 
practises. Two essays here convey Weinberg'’s 
responses to critics of his Whig convictions, 
but I wish he had put the arguments and 
counter-arguments more forcefully and at 
greater length. 

Weinberg is at his most interesting when 
probing the big uncertainties in physics. Ulti- 
mately, he is not sure, for instance, that he 
knows what an ‘elementary particle’ actually 
is, or how best to interpret quantum mechan- 
ics. These moments reveal Weinberg’s con- 
siderable integrity, where — as one of the 
smartest and most diligent scientists around 
— he describes himself as somewhat lost. 
He has surveyed the present, and all the best 
paths forward proposed by other scientists; 
yet, at gut level, he is confident that nothing 
on the horizon is fully satisfactory, and that 
other possibilities might be out there. These 
admissions imply that Weinberg suspects 
future historians may harbour a perception 
of today’s thinking very different from ours. 

The articles on public and personal 
matters — Weinberg’s thoughts about taxes, 
his disappointment with former US Presi- 
dent Barack Obama for failing to confront 
economic disparities more directly, and 
educators who are honoured with burial in 
Texas State Cemetery — are less interest- 
ing. Yet here, as elsewhere, he is clever: “The 
only technology for which the manned space 
flight program is well suited is the technol- 
ogy of keeping people alive in space. And the 
only demand for that technology is in the 
manned space flight program itself” 

Iread the penultimate essay with anticipa- 
tion. Weinberg reveals in the introduction 
that he had not published it before because 
nobody who read it liked it. You can see why. 
It’s an earnest piece of amateur philosophizing 
that compares creativity in theoretical physics 
to that in the arts, building on Weinberg’s feel- 
ing that success in each depends on “a sense of 
inevitability” He would have been clearer had 
he consulted a few philosophical concepts, 
such as Immanuel Kant’s idea that a beautiful 
work creates the feeling that it has the inevi- 
tability of a product of nature. But Weinberg 
(whose 1992 book Dreams of a Final Theory 
contains a chapter entitled ‘Against Philoso- 
phy’) was not about to take that route. In all 
these essays, his directness both enlightens 
and illuminates flaws in his own arguments. 
Witnesses, of course — even ideal ones — 
have blind spots. The best thing about Wein- 
berg’s essays is that they do, indeed, make you 
feel like a smart old attorney. = 
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Make databases 
language-proof 


Itis absurd to put effort and 
public resources into research that 
has already been published. This 
will continue to bea risk as long 
as papers in non-English journals 
are not routinely indexed in the 
international databases (see also 
J. Lebel and R. McLean Nature 
559, 23-26; 2018). 

Some national databases 
offer a partial solution (see 
J. Tao et al. Nature 557, 492; 
2018). For example, Ukraine's 
Panteleimon database 
(http://www.panteleimon.org) 
translates the title, abstract and 
some figure legends and tables 
into English. Nevertheless, 
people should never cite research 
that has not been read in full. 

The scientific community 
needs to develop a 
comprehensive multi-language 
translation tool with the help 
of services such as Google 
Translate. This would enable 
international researchers to 
access regional databases not 
compiled in English and to find 
out all the essential details — for 
instance, regarding experimental 
design and results, or whether 
the paper was peer-reviewed. It 
would also resolve problems of 
priority and giving proper credit. 
Daniel Prieto Instituto de 
Investigaciones Biologicas 
Clemente Estable, Montevideo, 
Uruguay. 
dprieto@fcien.edu.uy 


Beyond replicability 
in the humanities 


The humanities should take 
responsibility for quality in 
the same way the sciences 
do, argue Rik Peels and 
Lex Bouter, through the pursuit 
and institutionalization of 
replicability (Nature 558, 372; 
2018). We disagree: quality 
criteria are crucially different in 
the humanities and the sciences. 
The humanities pursue 
meaning beyond truth. 
Confirming that Van Gogh 
painted Sunset at Montmajour 


(truth) is only the beginning. 
Unearthing the cultural meaning 
of the work requires historical 
context and theorizing on its 
message, style, aesthetics — and 
what the work can tell us about 
the artist and his world (view). 
The coexistence of multiple valid 
answers and the value of their 
interaction disqualify replication 
as a viable quality criterion. 

Moreover, the humanities 
relate differently to their objects 
of study. They focus on both 
interactive kinds (people) and 
indifferent kinds (atoms, DNA 
sequences, paintings). Extracting 
meaning from interactive data 
requires continued interaction 
between informants, who might 
resist or embrace preliminary 
results or classifications. With 
co-producers of data and 
meaning, protocols are never set 
in stone, reporting guidelines are 
necessarily local and consent is 
always fluid. 

Replication is a mark of quality 
only in the construction of truth 
for indifferent kinds. Extracting 
meaning from interactive 
kinds requires evaluation and 
assessment according to different 
quality criteria — namely, 
those that are based on cultural 
relationships and not statistical 
realities. 

Sarah de Rijcke Leiden 
University, Leiden, 

the Netherlands. 

Bart Penders Maastricht 
University, Maastricht, 

the Netherlands. 
s.de.rijcke@cwts.leidenuniv.nl 


Help relieve poverty 
with solar power 


Of China’s ten poverty- 
alleviation projects, its 
development of photovoltaic- 
based solar power has been 
one of the most successful. We 
suggest that other countries look 
more explicitly at solar energy 
as a way of generating income 
in rural areas, in accord with 
the United Nations Sustainable 
Development Goal to eradicate 
global poverty by 2030. 

China's overall programme 


has lifted more than 50 million 
rural people out of poverty since 
2013 (Y. Zhou et al. Land Use 
Policy 74, 53-65; 2018). Solar- 
energy schemes launched in 
2014 supplied 7.9 gigawatts of 
power by the end of 2017, directly 
benefiting some 800,000 poverty- 
stricken families (see go.nature. 
com/2jtdxjh; in Chinese). In 
Lixin county in central China, 
for example, solar installations 
provided an additional annual 
income of more than 3,000 yuan 
(around US$440) for every family. 
Solar-power facilities provide 
employment opportunities, boost 
farmers incomes and supply 
households with affordable, 
reliable and sustainable energy, 
thus also helping to alleviate 
energy poverty. 
Yang Zhou, Yansui Liu Institute 
of Geographic Sciences and 
Natural Resources Research, 
Chinese Academy of Sciences, 
Beijing, China. 
liuys@igsnrr.ac.cn 


Land use must abide 
by peace agreement 


A resolution signed in June to 
allow agricultural development 
on 35% (40 million hectares) 

of Colombia's land could risk 
compromising the government's 
2016 Peace Agreement with 

the Revolutionary Armed 
Forces (see also Nature 558, 
169-170; 2018). The agreement 
places strict controls on the 
transformation of national 
lands and environmentally 
important areas. 

At present, just 20% of that 
land is under cultivation. How 
the other 80% may be used is 
unspecified, but we fear that 
ecologically friendly farming and 
traditional production systems 
— such as cattle ranching 
in flooded savannahs in the 
Orinoquia region — are likely to 
be replaced by more-intensive 
forms of land exploitation. 

The expansion threatens 
the peace process and 
prospects for sustainable rural 
development — already a 
challenge in a country where 


only 16% of the soil is legally 
protected against degradation 
(see go.nature.com/2v997uy). 
Luca Eufemia, Michelle 
Bonatti Leibniz-Centre for 
Agricultural Landscape Research, 
Miincheberg, Germany. 
Marcos A. Lana Swedish 
University of Agricultural 
Sciences, Uppsala, Sweden. 
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Rectify biased take 
on science history 


As members of the STEM 
Advocacy Institute, we find the 
typical Western view of science 
history distorted and incomplete 
and argue for more-balanced 
representation. Many non- 
Western scientists have made 
hugely important contributions 
to scientific knowledge, but their 
rich and inspiring stories garner 
little attention in the West. 

For example, Hippocrates 
is widely considered to be the 
‘father of medicine’ — even 
though the ancient Egyptians 
developed medicine as a 
profession 2,000 years earlier (see 
www.ancient.eu/imhotep). The 
first known physician in Egypt 
was aman named Imhotep, who 
was deified after his death for 
his medical achievements (see 
go.nature.com/2uxs5qd). Many 
such examples exist, but they are 
not well-known (see, for instance, 
J. Al-Khalili Nature 518, 164-165; 
2015; A. M. Celal Sengér Nature 
471, 162-163; 2011; J. Poskett 
Nature 550, 332; 2017). 

This means that schoolchildren 
are inculcated with a history 
that excludes the diversity of 
ethnicities, beliefs and cultures 
that have contributed to today’s 
science, technology, engineering 
and mathematics. Ignoring these 
reinforces stereotypes and the 
marginalization of certain groups, 
whereas balancing the narrative 
would positively influence those 
who are already disadvantaged in 
our classrooms. 
Aiza Kabeer Manchester, New 
Hampshire, USA. Jessica W. Tsai 
Boston, Massachusetts, USA. 
scholarship@stemadvocacy.org 
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CARBON CYCLE 


Microbes weaken soil carbon sink 


The rate at which carbon dioxide is lost from soil has risen faster than the rate at which it is used by land plants, because soil 
microbes have become more active — possibly weakening the land surface’s ability to act as a carbon sink. SEE LETTER P.80 


KIONA OGLE 


lhe terrestrial land surface has a 

crucial role in the global carbon cycle, 

providing feedbacks to changes in 
atmospheric levels of carbon dioxide and 
associated climate change’. Increases in atmos- 
pheric CO, concentrations and in soil and air 
temperatures worldwide over the past several 
decades have been paralleled by an increase in 
the metabolism of organisms at the land sur- 
face — as demonstrated by enhanced rates of 
CO, uptake, mainly by plants through photo- 
synthesis, and of CO, loss from plants and soil 
microorganisms, mostly owing to respiratory 
processes” °. On page 80, Bond-Lamberty 
et al.’ report that the rate of increase of CO, 
loss is outpacing that of CO, uptake by plants. 
The authors attribute the imbalance in these 
rates of increase to enhanced activity of 
microbes that obtain nutrition by decompos- 
ing or mineralizing organic matter in soil. If 
the observed trend continues, then respiration 
by microbes could contribute substantially to 
global warming by releasing CO, from organic 
matter that has previously been stored in soil 
for decades to millennia. 

A variety of processes underlie the exchange 
of CO, between the land surface and the 
atmosphere. Bond-Lamberty et al. focused on 
soil respiration, which is arguably one of the 
largest fluxes of CO,. The authors analysed 
previously published soil-respiration data® 
from many sites around the world that cov- 
ered a broad range of ecosystems, including 
cropland, temperate forest and desert. They 
used these data to estimate the annual rates of 
soil respiration at various sites and to evaluate 
trends between 1990 and 2014. 

Bond-Lamberty and colleagues then com- 
pared trends in soil respiration (CO, loss) 
with those of plant productivity (CO, uptake) 
that were derived from different data sources, 
including satellites. They found that the ratio 
of the rate of soil respiration to that of plant 
productivity has, in general, increased over 
the period covered by their data set. The ratio 
rarely exceeded 1, except at certain sites in 
particular years, which indicates that specific 
situations can lead to more CO, being lost 
from soil than is taken up by plants. 

The findings beg the question of whether 
the average global ratio could become greater 
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Figure 1 | Tipping the balance of carbon fluxes at Earth’s land surface. a, The main sources of carbon 
dioxide from the terrestrial biosphere are respiration that is associated with the decomposition of organic 
matter in soil by microbes, and respiration of plants (both aboveground and belowground). Plants also 
absorb CO, through photosynthesis. At present, the total amount of CO, that is absorbed by plants 
exceeds the amount that is produced by respiration, and so the land surface acts as a carbon sink. Arrow 
widths roughly correspond to the sizes of the CO, fluxes. b, Bond-Lamberty et al.’ report that, over the 
past few decades, the rate at which CO, is produced by soil microbes has increased faster than that at 
which CO, is used by plants. This raises the possibility that the rate of respiration will reach a tipping 
point at which it overtakes the rate of CO, uptake by plants. Under such a scenario, the land surface would 
act as a source of atmospheric CO,. The time at which such a tipping point would be reached is unclear. 


than 1 in the future, and, if so, when? Such an 
event would mark the tipping point at which 
the land surface stops operating mainly as a 
sink that helps to remove atmospheric CO, that 
is derived from fossil-fuel emissions”"®, and 
starts acting as a source of CO, — exacerbating 
rising CO, levels and accelerating the pace of 
climate change" (Fig. 1). 

The authors next focused on studies in their 
data set that broke down total soil respiration 
into respiration dominated by decomposi- 
tion by microbes and that associated with 
plant roots. Analysis of the microbial-domi- 
nated respiration rates led them to conclude 
that the disproportionately faster increase in 
the rate of total soil respiration is due to the 
enhanced activity of soil microbes. However, 
to understand whether accelerated rates of 
soil respiration will cause the land surface to 
become a source of CO,, the temporal trends in 
respiratory losses associated with aboveground 
plant biomass must also be considered — the 
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total loss of biologically derived CO, from the 
terrestrial biosphere is the sum of the soil and 
non-soil losses. 

As Bond-Lamberty et al. acknowledge, 
previously published long-term data” 
recorded by eddy-covariance towers, which 
continuously monitor CO, concentrations 
and fluxes at specific sites across a range of 
ecosystems, suggest that the rate of increase of 
plant productivity has been faster than that of 
the total aboveground and belowground res- 
piratory CO, losses. Further data and analyses 
are required to explain why those findings 
apparently contradict the authors’ results. 

If Bond-Lamberty and colleagues’ findings 
are correct, which mechanisms could explain 
the markedly enhanced stimulation of the 
activity of soil microbes relative to plant pro- 
ductivity and plant respiration? Studies in the 
past few years have shown that the ability of 
plants to downregulate respiration in response 
to long-term increases in temperature” is 


much greater than that of short-lived soil 
microbes*'*"”. The authors suggest that the 
increased microbial activity observed in their 
study probably reflects the stimulatory effects 
of elevated temperatures associated with 
climate change. 

There are, however, potential issues when 
drawing global inferences from the data ana- 
lysed by Bond-Lamberty and co-workers. 
Most of the data came from spot measure- 
ments of soil-respiration rates that were 
obtained by many different researchers, who 
used a variety of methods to work out the 
contributions of soil microbes. This diversity 
of methods might have led those researchers 
to come to contrasting conclusions about the 
relative importance of soil microbes in their 
studies. Moreover, Bond-Lamberty et al. used 
simplifying assumptions to translate hourly or 
daily snapshots of respiration rates into annual 
fluxes of CO,, but did not take into account 
the uncertainty in these calculations. The 
soil-respiration data set is also limited in its 
temporal coverage of individual sites: repeated 
observations were available for only a handful 
of sites, yet recurrent observations are neces- 
sary to prevent temporal trends from being 
obscured by factors that vary between sites. 

The authors acknowledge and account for 
some of these limitations in their statistical 
analyses, but clearly there is room for a more 
rigorous investigation. This would require 
researchers to gather continuous time series 
of soil respiration and its component fluxes, 
and demands the use of precise methods for 
quantifying uncertainty and for extrapolating 
local measurements to determine trends in 
larger regions. Despite the limitations, Bond- 
Lamberty and colleagues’ work is valuable 
because it aids our understanding of soil’s long- 
term potential for sequestering carbon — as 
well as how this sequestration might be threat- 
ened by accelerated rates of organic-matter 
decomposition by soil microbes. Their find- 
ings will be crucial for developing and testing 
models of the global carbon budget, of which 
soil carbon is a central component. 

Fluxes of CO, across whole ecosystems are 
often measured using eddy-covariance towers. 
By contrast, continuous measurements of soil 
respiration and decomposition by microbes 
are not broadly available for sites worldwide or 
do not cover multi-year periods. The establish- 
ment of long-term observational projects such 
as the US National Ecological Observatory 
Network (NEON), which monitors fluxes of 
soil CO, among other ecological measures, will 
create opportunities for the systematic evalu- 
ation of temporal trends and the underlying 
causes of changes in the rates at which CO, is 
lost from soil. Such data will be paramount for 
developing regional and global models of the 
carbon cycle, as well as for assessing climate 
change and the strategies by which it might 
be mitigated’®. m 
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Peptide secretion 
triggers diabetes 


An autoimmune attack on cells that make the hormone insulin causes type 1 
diabetes. A mouse study reveals that pancreatic-cell release of insulin peptide 
fragments into the bloodstream triggers this harmful process. SEE LETTER P.107 


JIAJIE WEI & JONATHAN W. YEWDELL 


iabetes arises from problems in the 
D regulation of blood glucose, which 

is controlled by releasing the hor- 
mone insulin. The amount of insulin made is 
abnormally low in type 1 diabetes owing to the 
autoimmune-mediated destruction of insulin- 
producing B-cells in the pancreas. Wan et al." 
reveal on page 107 how this immune attack is 
triggered. 

Type 1 diabetes was a lethal condition with 
a life expectancy of just months until the 
discovery of insulin in 1921 enabled clinical 
management by insulin injection’. Although 
this therapy greatly extends life expectancy, a 
deeper understanding of the disease is needed 
to develop treatments that delay or prevent 
disease onset. 

Studies of non-obese diabetic (NOD) mice, 
which spontaneously develop the disease, have 
provided insights into the mechanisms that 
cause the condition. Such work has revealed 
that T cells of the immune system have a key 
role in destroying insulin-producing pancre- 
atic B-cells. T-cell immunosurveillance is aided 
by antigen-presenting cells, which present pep- 
tide fragments called antigens on their surface 
bound to major histocompatibility complex 
(MHC) class I and II molecules. T cells typi- 
cally encounter antigen-presenting cells in 
the lymph nodes, and their T-cell receptor 
samples the antigens presented on MHC mol- 
ecules. T cells that respond to self-antigens are 
normally eliminated as they mature in the thy- 
mus gland, but imperfections in this process 
can lead to autoimmunity. Although many 
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proteins present in pancreatic B-cells could 
potentially provide the autoimmune trigger 
for type 1 diabetes, insulin is the culprit in 
NOD mice’. 

In type 1 diabetes in humans and NOD 
mice, the presentation of a peptide consist- 
ing of amino acids 12 to 20 of insulin’s B chain 
(B:12-20) by MHC class I molecules can acti- 
vate CD4 T cells that recognize this peptide’. 
In NOD mice, a class II MHC molecule called 
I-A® presents B:12-20to CD4T cells and acti- 
vates them. These cells then initiate a process 
that activates CD8 T cells, which are specific 
for other B-cell peptides’. Activated CD8 cells 

kill B-cells, leading 


“Why isn’t there . ene when 
yeapaiegh against ut z. rare 
ps eee i risk ofa person devel- 
° A ° oping type 1 diabetes 
insulin p eptides is often linked to 
that trigger F MHC class II genes, 
oacidehiaionias Mm which are among the 
humans mostly highly vari- 


able human genes. 
The gene encoding the version of MHC class II 
called HLA-DQ$8 is tightly associated with the 
disease*. Remarkably, HLA-DQ$8 has a highly 
similar peptide-binding specificity to that of 
I-A® (ref. 6). 

Wan and colleagues confirmed the 
pathological potential of T cells that are specific 
for I-A*-B:12-20 complexes by introducing 
such T cells into NOD mice under conditions 
that generate enough CD4 T cells to cause 
type 1 diabetes. As a control, the authors did 
a similar transfer into NOD mice that have a 
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Figure 1 | Insulin and type 1 diabetes. If T cells launch an autoimmune attack on pancreatic B-cells, 
which produce insulin protein, it causes diabetes. The source of the specific insulin peptides that can 
prime the T cells that give rise to diabetes has remained unknown. a, Wan et al.' have now identified this 
source. Glucose from the blood enters {B-cells through the GLUT channel protein, triggering the release 
of insulin from granules into the blood. The group found that old insulin granules are destroyed by fusing 
with an organelle called the lysosome to form another organelle known as the crinosome, in which insulin 
peptides arise. The peptides enter the bloodstream together with full-length insulin after glucose influx. 
b, The authors report that lymph nodes (not shown) are a site where an insulin peptide that might trigger 
diabetes (such as amino-acid residues 12 to 20 of the insulin B chain; B:12-20) can bind to a receptor 
termed I-A” on an antigen-presenting cell (APC) of the immune system. When such a peptide-bound 
APC encounters a T cell that recognizes the peptide through its T-cell receptor (TCR), this activates the 

T cell and sets in motion a process that can cause diabetes. 


mutation in insulin that renders the antigen 
non-immunogenic. Several months later, the 
authors sequenced RNA from T cells derived 
from the transferred cells. These T cells 
expressed genes characteristic of an activated 
state only in the mice that had the immuno- 
genic version of the antigen. Moreover, when 
these activated T cells were transferred into 
NOD mice that lacked T cells, they caused 
type 1 diabetes much more rapidly than when 
T cells from the control mice were used instead, 
confirming the role of these cells and this 
specific antigen in the generation of diabetes. 

The authors sought to investigate how 
and where the diabetes-causing CD4 T cells 
are activated. They devised a highly sensi- 
tive and technically challenging microscopy 
approach that measures T-cell recognition 
of I-A®’-B:12-20 complexes by monitoring 
a decrease in the mobility of CD4 T cells in 
mouse lymph nodes transferred from animals 
into a culture medium that promoted T-cell 
proliferation. This approach builds on previ- 
ous findings that presentation of this antigen 
occurs in lymph nodes throughout the body, 
not just in those that drain from the pancreas’. 
Control experiments established that CD4 
T cells are not activated by this antigen in 
NOD mice lacking I-A® or in animals that 
have an insulin mutation that prevents B:12-20 
peptide binding to I-A’. 

How do antigen-presenting cells in the 
lymph node acquire insulin peptides? One 
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possibility is that they take up and process full- 
length insulin, given that these cells express 
insulin-binding receptors. However, when mice 
received a drug that blocks insulin binding and 
uptake, this did not block insulin recognition 
by CD4T cells in lymph nodes. 

So where are these insulin peptides 
generated? The authors turned their attention 
to pancreatic B-cells. These cells contain large 
amounts of insulin stored in granules, which 
are released into the bloodstream when blood 
glucose rises after a meal. As a quality-control 
measure, insulin has an ‘expiration date; and 
old insulin granules are ‘retired’ and degraded 
in an organelle called a crinosome, which 
forms when a granule fuses with an organelle 
called a lysosome (Fig. 1). 

By using a mass-spectrometry technique to 
identify peptides, Wan et al. found that crino- 
somes, but not insulin granules, contain sub- 
stantial amounts of insulin peptides that are 
associated with diabetes. The authors deter- 
mined that intravenously administered insulin 
peptides can rapidly reach antigen-presenting 
cells in lymph nodes, consistent with a model 
in which insulin peptides linked to diabetes are 
released into the bloodstream. 

By providing an explanation of how 
disease-causing T cells can be activated, 
Wan and colleagues’ findings raise many 
important questions. Given the short life 
expectancy for patients with untreated type 
1 diabetes, one might have expected robust 
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selection pressure against HLA-DQS8. Yet 
it is present at extremely high frequency in 
many ethnic groups’, suggesting that there is 
a strong counterbalancing selective advantage 
in retaining HLA-DQ8. And why isn’t there 
selection against the generation of secreted 
insulin peptides that trigger autoimmunity 
in humans? Perhaps the peptides have some 
positive functions, such as hormonal activ- 
ity, that might also explain the timing of their 
release from B-cells together with insulin on 
glucose stimulation. It will be interesting to 
learn whether different types of secretory cell 
release peptides generated in crinosomes, and 
whether this contributes to other autoim- 
mune diseases. 

Wan et al. show that the activation of 
diabetes-causing T cells in NOD mice is 
independent of antigen presentation by B 
cells or dendritic cells. Which type of antigen- 
presenting cell is therefore responsible? Per- 
haps the most puzzling question of all is how 
insulin peptides activate T cells in the absence 
of the inflammatory signals that are usually 
needed to trigger an immune response. NOD 
mice are maintained in pathogen-free condi- 
tions, so they are unlikely to have infections 
that could provide the necessary inflamma- 
tory cues. Microorganisms that naturally 
reside in NOD mice could potentially pro- 
vide immune triggers. Indeed, antibiotics 
can modulate type 1 diabetes susceptibility in 
NOD mice’, implicating bacteria in disease 
pathogenesis. 

Although insulin supplements can provide 
a profound improvement in diabetes treat- 
ment, the situation is far from perfect and the 
life expectancy of those who develop type 1 
diabetes is decreased by more than a decade”. 
By uncovering key steps in the development of 
this condition, Wan and colleagues’ findings 
might hasten the day when type 1 diabetes is 
relegated to medical history. m 
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Origin of 
the blues 


Some diamonds are a beautiful blue colour, 
because they contain trace quantities of 
boron. But blue diamonds are formed 

in Earth’s mantle, whereas boron is 
concentrated in the crust. So where did these 
diamonds get their boron from? On page 84, 
Smith et a/. provide the answer to this 
geochemical conundrum (E. M. Smith et al. 
Nature 560, 84-87; 2018). 

The authors analysed minerals trapped 
in 46 blue diamonds, and from this worked 
out that the gemstones must have formed 
in the lower mantle. Their analysis also 
suggests that the material from which 
the diamonds formed contained water 
and came from the oceanic lithosphere 
(tectonic plates beneath the sea), a rich 
source of boron. 

The findings mean that blue diamonds are 
some of the deepest ever found. Moreover, 
they reveal a geochemical pathway that 
extends from the oceanic lithosphere at 
Earth’s surface to the lower mantle, and a 
potential route for the ultra-deep cycling of 
water in our planet. Andrew Mitchinson 


A cancer shortcut to 
the nervous system 


Certain cancers are prone to invade the nervous system, which leads to poorer 
prognosis. A study of leukaemia in mice reveals an unexpectedly direct invasion 
route from the bone marrow to the central nervous system. SEE ARTICLE P.55 


FRANK WINKLER 


cancers enter the central nervous system 

(CNS) and grow there, the treatment 
options and clinical outlook deteriorate 
rapidly. In a type of leukaemia called acute 
lymphoblastic leukaemia (ALL), invasion of 
the CNS commonly occurs. To try to limit this, 
people with the condition often receive radia- 
tion or chemotherapy that targets the CNS. 
If more-effective and less-toxic approaches 
became available to prevent disease spread to 
the CNS, this might benefit many people with 
ALL. On page 55, Yao et al.' report a hitherto 
unknown route that ALL cells use to enter 
the CNS, and suggest a possible therapeutic 


I: malignant cells from solid or blood 


approach that is worth investigating. 

When leukaemia spreads into the CNS, this 
process, termed metastasis, is mainly limited to 
the region known as the subarachnoid space, 
which contains cerebrospinal fluid that bathes 
the brain and spinal cord. The subarachnoid 
space is surrounded by membranes called the 
dura mater, the pia mater and the arachnoid, 
which are collectively called the meninges and 
are also colonized by cancer cells (Fig. 1). Yao 
and colleagues used mouse models of ALL to 
investigate how human leukaemia cells spread 
into the CNS. They focused on the enzyme 
PI3K, which is a key regulator of signalling 
pathways needed for growth, survival and 
invasion by cancer cells. 

The authors studied three different mouse 
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models of ALL, and gave animals a molecule 
called GS-649443 that inhibits a version of 
PI3K called PI3K6. Animals that received the 
drug had reduced signs of cancer invasion of 
the CNS compared with those that did not 
receive it. The authors studied bone-marrow 
sites that are commonly rich in leukaemic 
cells, and found no evidence that the drug was 
affecting the growth or motility of leukaemic 
cells or cancer progression at sites outside the 
CNS. This result suggests that PI3K inhibition 
specifically affects the ability of leukaemic 
cells to enter the CNS — which is surprising, 
because the blood-brain barrier usually poses 
a formidable obstacle that restricts the ability 
of molecules or cells to exit blood vessels and 
enter the brain or the cerebrospinal fluid. 
Cancer cells that reach the CNS are likely to 
be beyond the reach of drugs present in the 
body’s bloodstream. 

Yao and colleagues analysed gene expres- 
sion in cancer cells to identify genes whose 
expression decreased as a result of PI3K 
inhibition. This pointed them towards the 
gene that encodes a receptor protein called 
a6 integrin. This receptor can bind a protein 
called laminin, which is a key component of 
the extracellular material that surrounds large 
blood vessels. Laminin is also a component of 
the meninges’ and is present in a CNS struc- 
ture called the choroid plexus, which is where 
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Figure 1 | A route for cancer entry into the central nervous system. Cancer migration to the central 
nervous system (CNS, which comprises the brain and spinal cord) is often associated with poor 
prognosis. Yao et al.' report that human blood-cancer cells in mice reach the central nervous system 

by migrating along the external surface of blood vessels. This solves the mystery of how these cells 

leave their site of origin in the bone marrow and reach a region called the subarachnoid space, which 
contains cerebrospinal fluid. This is located in a region termed the meninges, which covers the CNS and 
contains membrane layers called the dura mater, arachnoid and pia mater. The authors report that this 
migration process depends on an enzyme called PI3K6 (not shown), and requires a receptor protein 
called a6 integrin on cancer cells. This receptor can bind to the protein laminin, which coats the surface 
of blood vessels and is also found in the meninges. Laminin aids the migration of neuronal cells during 
development’, so perhaps these cancer cells have hijacked components of a natural migration process. 


immune cells and cells from solid tumours 
often enter the CNS’. 

When solid brain tumours arise from a 
cancer that has spread from elsewhere in the 
body, a common theme is tumour-cell entry 
into the brain by a process that requires signal- 
ling by a type of integrin protein called 61 inte- 
grin, which binds to laminin-containing blood 
vessels deep in the brain*’. However, when 
Yao and colleagues analysed brain cells using 
microscopy techniques, they found no evi- 
dence that the choroid plexus or these vessels 
in the brain are places where ALL cells enter 
the CNS, confirming the results of previous 
human and mouse studies*. 

Laminins and laminin-binding integrin 
receptors act in neuronal path-finding pro- 
cesses during development’, and are also key 
mediators of migration processes for healthy 
and cancerous cells”’. Using microscopy 
approaches in their mouse models, Yao and 
colleagues discovered that small blood vessels 
coated with laminin that connect the bone 
marrow of the skull and the nearby meninges 
are sites with high levels of ALL cells. Tran- 
sit of ALL cells along the external surface of 
these vessels could provide a direct route for 
the cells to reach the CNS. Previous studies®* 
have led to speculation about whether ALL 
cells spread to the CNS bya direct route, given 
that the outer layer of the meninges — the dura 
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mater — can have a high level of infiltration 
by ALL cells. 

Conducting studies in vitro, the authors 
found that cerebrospinal fluid contains 
chemokine molecules that can provide a 
chemical attractant for ALL cells. Yao and 
colleagues’ work also indicates that ALL-cell 
migration towards laminin depends on the 
presence of a6 integrin and can be blocked by 
PI3K6 inhibition. The authors observed that 
treating mice with GS-649443 consistently 
diminished the invasion of ALL cells along 
the vascular corridors between the bone mar- 
row and CNS. Moreover, giving animals an 
antibody that blocks a6 integrin prevented 
ALL spread to the CNS without having strong 
effects on ALL progression outside the CNS. 
Finally, the authors carried out a limited inves- 
tigation of clinical samples from 26 people with 
ALL and found that a higher level of expres- 
sion of a6 integrin correlated with a higher 
probability that a person’s cancer had invaded 
the CNS. 

Interestingly, the ability of cancer cells to 
hijack pathways required for neurodevelopment 
has been demonstrated for key steps leading to 
brain-tumour progression’, which suggests that 
this might be a key mechanism underlying how 
cancer cells successfully colonize the CNS. 

Efforts to prevent cancers from reaching the 
CNS might be a particularly effective approach 
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for improving the clinical outlook for many of 
these diseases. In a mouse model* of tumour 
progression to brain sites, inhibition of the 
protein VEGF-A hinders a crucial early switch 
needed for the formation of tumour blood 
vessels. In ALL, VEGF-A expression levels are 
high, and a VEGF-A blocking antibody can 
prevent the disease invading the meninges 
of the CNS". Yao and colleagues found that 
PI3K6 inhibition results in substantially lower 
expression of VEGF-A (see Extended Data 
Fig. 4b of the paper’), raising the question of 
whether this protein is also involved in the col- 
onization of the CNS in ALL. Because blood- 
vessel formation is not likely to be involved for 
ALL, given that it is not a solid cancer, perhaps 
VEGF-A alters the blood vessels that penetrate 
the CNS, to make them more likely to provide 
routes for ALL invasion. 

Questions remain about the exact route of 
ALL entry to the CNS. Do the microvessels 
that directly bridge this route and penetrate 
tiny holes in the skull, as demonstrated by Yao 
and colleagues in their mouse studies, also exist 
in humans? If cancer cells follow the surfaces 
of known larger blood vessels to travel from 
the bone marrow to the subarachnoid space 
in humans, such a route would be of consider- 
able anatomical complexity. Moreover, cancer 
cells from solid tumours regularly colonize the 
bone marrow and frequently invade bone, so 
it would be interesting to determine whether 
such cells can also use the route taken by ALL 
to reach the subarachnoid space. 

The presence of laminin at all known sites of 
cancer-cell entry to the CNS, and the ability of 
laminin exposure to boost the survival of can- 
cer cells** clearly indicates that more-extensive 
studies to investigate laminin’s role in cancer- 
cell entry to the CNS are called for. Which ver- 
sions of integrins and laminin are involved, 
and which exact routes are used by cancer 
cells? Finally, Yao and colleagues’ insights 
could be relevant to the clinic. Another PI3K6 
inhibitor, called idelalisib, is already in use to 
treat certain blood cancers. Could this drug 
finally offer a way to target the CNS entry of 
ALL or other types of cancer? = 
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ACOUSTICS 


Negative refraction 
without reflection 


At the interface between two facets of an artificial crystal, sound waves can be 
transmitted in the opposite direction to that expected, and undergo no reflection. 
Such wave behaviour could have many applications. SEE LETTER P.61 


BAILE ZHANG 


aves change direction when they 

pass from one medium to another 

—a phenomenon called refraction. 
This effect underlies most optical lenses and 
instruments, and is widely found in acoustics 
when an acoustic beam behaves like an opti- 
cal beam. In general, some of the waves are 
reflected during the refraction process. On 
page 61, He et al.’ report an impressive dem- 
onstration of a previously unobserved refrac- 
tion phenomenon. They show that, in a certain 
artificially engineered material, an acoustic 
beam can be refracted in the opposite direc- 
tion to that seen in ordinary materials, without 
reflection. The authors’ findings could lead to 
improved control of waves in electronic and 
photonic systems. 

When an acoustic or optical ray strikes the 
interface between two different media, part 
of its energy passes through the interface to 
form a refracted ray (Fig. la). The remaining 
energy reflects from the interface to produce 
a reflected ray. In nature, the incident and 
refracted rays are always on opposite sides of 
the normal — an imaginary line perpendicular 
to the interface. But, in theory, this need not 
be the case. 

In 1968, the Russian physicist Victor 
Veselago considered a hypothetical material 
that has a negative refractive index”. A refrac- 
tive index describes how waves propagate in 
a medium, and is positive in all conventional 
materials. Veselago showed that the way in 
which refraction usually occurs could be 
reversed in a negative-index material: the 
refracted ray could emerge on the same side 
of the normal as the incident ray (Fig. 1b). 

Although intriguing, negative refraction did 
not trigger much attention, and was consid- 
ered impossible for more than 30 years because 
it was thought that negative-index materials 
could not exist. The situation changed in 2000, 
when the British physicist John Pendry made a 
shocking prediction’: that negative refraction 
could be used to make a lens that could focus 
light more tightly than is normally possible. 
He also identified a practical way to construct 
negative-index materials in the lab using arti- 
ficial structures. Such materials, now gener- 
ally referred to as metamaterials, stimulated 
research into concepts such as invisibility 


cloaking* that had previously existed only in 
science fiction. 

In the years since Pendry’s work, the pursuit 
of negative refraction has led to developments 
in optics, acoustics, plasmonics (the study of 
how light interacts with electrons in metals) 
and even graphene-based electronics’. Ver- 
sions of negative refraction have been realized 
in each of these areas. However, the phenom- 
enon is generally accompanied by reflection, 
which is often undesirable. In many cases, such 
as in experiments involving the refraction of 
electrons through an interface’, reflection can 
even dominate negative refraction. 

The property of reflection immunity is not 
found in natural optical materials for light. 
However, it does occur in exotic phases of 
matter known as topological quantum matter, 
for quantum-mechanical electronic waves. A 
well-studied example is the topological insu- 
lator, which is an electrical insulator in its 
interior, but conducts electricity on its surface 
through electronic waves called topological 
surface states. Such states are able to propagate 
unidirectionally — they bypass obstacles and 
defects, rather than being reflected. 

He and colleagues’ demonstration was 
directly inspired by another emerging topo- 
logical quantum matter: the Weyl semimetal’. 
The topological surface states in this material 
cannot propagate in all directions; propaga- 
tion is confined to a certain range of direc- 
tions, which connect to form what are known 
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as Fermi arcs®. Because the limited range of 
propagation directions does not include the 
direction in which reflection would normally 
occur, reflection is forbidden (Fig. 1c). 

In their experiment, He et al. used an 
artificial crystal that is an acoustic analogue 
of the Weyl semimetal. They found that, at the 
interface between two adjacent facets of the 
crystal, airborne acoustic waves could undergo 
negative refraction without reflection. The 
authors results represent the first realization 
of negative refraction for topological surface 
states. 

There are a few limitations of the work. For 
instance, the refraction does not occur in a flat 
plane, contrary to the common impression of 
refraction. Moreover, the interface scatters 
some of the acoustic waves into the crystal’s 
interior, resulting in energy loss. Nevertheless, 
the demonstration opens the door to many 
exciting opportunities for further research. 

The immediate question is whether He and 
colleagues’ refraction phenomenon could be 
realized in optical systems for light and con- 
densed-matter systems for electrons. Another 
question, which will be of interest to both opti- 
cal and condensed-matter physicists, is how to 
engineer the range of propagation directions 
— and, in turn, the Fermi arcs — to achieve 
greater control of negative refraction. In this 
sense, the authors’ work provides the first 
practical use of Fermi arcs, which are currently 
being enthusiastically explored in condensed- 
matter systems’* and in optical structures 
called photonic crystals’. 

The refraction phenomenon could also find 
widespread use in acoustics. For example, the 
combination of negative refraction and zero 
reflection could lead to improved resolution 
in ultrasonic imaging and testing. Moreover, 
acoustic waves are used in biomedical micro- 
fluidic devices to trap, sort and deliver cells 
and drug particles. Reflection-free acoustic 
waves are strongly desirable in such applica- 
tions, because reflections at the interfaces and 
sharp corners of microfluidic channels are cur- 
rently a huge limitation to device efficiency. 


Figure 1 | Comparison of refraction phenomena. a, In conventional refraction, when an acoustic or 
optical ray (red) hits the interface between two different media, a reflected ray (dark blue) and a refracted 
ray (light blue) are produced. The incident and refracted rays exist on opposite sides of the normal — an 
imaginary line perpendicular to the interface. b, In negative refraction, the refracted ray emerges on the 
same side of the normal as the incident ray. c, He et al.' report a previously unobserved type of refraction 
for acoustic rays, in which not only are the incident and refracted rays on the same side of the normal, but 
also there is no reflected ray. (Figure adapted from ref. 1.) 
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50 Years Ago 


The prospect of beer by the litre 

for British drinkers came a step 
closer last week. Mr Anthony 
Wedgwood Benn, the Minister of 
Technology, announced that the 
United Kingdom is to adopt the 
metric system of measurement 

by 1975 — the target date already 
accepted by British industry for its 
timetable ... The industrial change 
is going ahead fast ... but the non- 
industrial sector of the economy 
and the general public have been 
lagging behind ... It is “imperative” 
for the planning of the change in 
the general sectors of the economy 
to be put in hand; if this is not done, 
“the dynamism of the industrial 
change will be lost” ... The cost- 
effectiveness of metrication is not... 
likely to be known until it is a fact. 
So far, certainly, hunch has played a 
greater part than has sober analysis 
of the situation. 

From Nature 3 August 1968 


100 Years Ago 


... Finally, there is the personal use 
of insecticidal preparations as aids 
to the primitive method of getting 
rid of [lice] — now referred to as 
“chat”-hunting ... [T]he preparation 
should be of quick action and easy 
of application to clothing, and 
its issue should be as general and 
comprehensive as that of food... 
[P]astes are more economical 
and convenient than powders; 
fluids are out of the question. 
Crude “unwhizzed” naphthalene, 
produced by coke-oven plants, 
affords the most effective base, 
and may be conveniently mixed 
into paste form by the addition of 
soft soap or some grease, such as 
vaseline, in the proportion of 10 to 
20 per cent ... When it is necessary 
to use an anti-lice preparation ona 
hair-clad surface the use of vaseline, to 
which has been added % per cent. of 
veratrine dissolved in 5 per cent. of 
benzene, may be recommended. 
From Nature 1 August 1918 
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Topological acoustics is therefore a promising 
research field that not only can produce phe- 
nomena that are difficult to realize in other 
physical systems, but could also bring about 
transformative technologies. = 
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An unexpected trigger 
for calorie burning 


The molecule succinate, which is a product of metabolism, promotes heat 
production and therefore calorie burning in brown fat in mice. This discovery 
could have implications for combating obesity in humans. SEE LETTER P.102 


SHENG HUI & JOSHUA D. RABINOWITZ 


r | here are two ways to lose weight: eat 
less to reduce the number of calories 
available for metabolism by the body, 

or burn more calories, for example through 
exercise. On page 102, Mills et al.' identify 
a molecule produced during nutrient meta- 
bolism that, surprisingly, induces calorie 
burning. This metabolite, succinate, activates 
energy expenditure in brown fat. Remarkably, 
supplementing the drinking water of mice 
with succinate can prevent the animals from 
gaining weight. 

Brown fat is different from the white 
fat that builds up around our waistlines. 
Whereas white fat acts as an energy reserve, 
brown fat specializes in heat generation, and 
is essential for mammals to maintain their 
body temperature in the cold’. Brown-fat 
cells contain smaller lipid droplets than do 
white-fat cells, and have many more orga- 
nelles called mitochondria’, which enable 
brown fat to generate heat. 

In mitochondria, a metabolic pathway 
called the TCA cycle breaks down nutrients 
such as glucose, lactate and fat into carbon 
dioxide, using the energy stored in the nutri- 
ents to generate high-energy electrons. These 
electrons are used to pump protons (hydro- 
gen ions, H") out of the interior matrix of 
the mitochondrion into the space between 
the organelle’s inner and outer membranes, 
thereby converting the energy into a proton 
gradient. Normally, protons re-enter the 
mitochondrial matrix through a membrane- 
spanning protein complex called the proton 
ATPase. This complex uses the energy stored 
in the proton gradient to convert ADP mol- 
ecules into energy-carrying ATP molecules, 
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and thereby generates most of the body’s 
usable energy. But in brown fat, protons pass 
through another protein, uncoupling pro- 
tein 1 (UCP1). This transporter uncouples the 
process of crossing the mitochondrial mem- 
brane from that of ATP production, effectively 
wasting the proton gradient’s energy as heat 
(reviewed in ref. 4). 

This capacity of brown fat to dissipate 
calories as heat has attracted much attention, 
in the hope of activating the process to combat 
obesity’. To do this, it is necessary to know 
what switches on calorie burning by brown 
fat. At the macroscopic level, the main answer 
is exposure to cold. At the mechanistic level, it 
has been proposed that the brain senses cold 
and sends signals to brown fat through a pro- 
cess mediated by proteins called B-adrenergic 
receptors’. But drugs that activate these recep- 
tors have not been successful in curbing obe- 
sity®. Thus, there is intense interest in finding 
new pathways that activate heat generation in 
brown fat. 

Mills et al. began by searching for 
metabolites that are selectively abundant in 
brown fat, and whose concentration in this 
tissue increases during cold exposure. Their 
survey identified succinate, one of the meta- 
bolic intermediates of the TCA cycle. 

The TCA cycle is generally assumed to be a 
cell-intrinsic process in which most intermedi- 
ates are trapped in the mitochondrial matrix. 
Thus, most succinate is consumed by the same 
cell that produces it. Some succinate, how- 
ever, makes its way into the bloodstream. The 
authors provide evidence that a key trigger for 
the release of succinate may be muscle activity, 
because shivering in response to cold increased 
blood succinate levels in mice. 

To trace the fate of succinate circulating in 
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Figure 1 | Circulating succinate molecules mediate calorie burning. Mills et al.' report a mechanism 
by which weight gain can be controlled in mice. Succinate added to drinking water is ingested and enters 
the systemic blood circulation from the gut. Succinate is also released into the circulation by muscle cells 
during shivering in response to cold. From the circulation, succinate can enter brown-fat cells, which 
contain many organelles called mitochondria. Here, it triggers the mitochondrial protein UCP1 to leak 
protons (hydrogen ions; H*), converting chemical energy into heat and thereby burning calories. 


the blood, Mills and colleagues injected mice 
with succinate tagged by a heavy isotope of 
carbon. They found that the carbon isotope 
accumulated preferentially in brown fat. Thus, 
brown fat seems to be programmed to use cir- 
culating succinate as a fuel. Consistent with 
this, the authors showed that isolated brown- 
fat cells, but not most other cell types tested, 
avidly took up and burnt succinate. 

Mills et al. next showed that acute succinate 
administration in mice raised the local temper- 
ature of brown fat. And, strikingly, administer- 
ing succinate in drinking water for four weeks 
prevented obesity in mice on a high-fat diet. 
These metabolic effects depended on UCP1 
— most of the beneficial metabolic effects 
of succinate were absent in mice genetically 
engineered to lack this protein. Thus, succinate 
activates heat production and calorie burning 
in brown fat (Fig. 1). 

How exactly does succinate trigger heat 
production? In the TCA cycle, succinate is 
consumed by the enzyme succinate dehydro- 
genase. The activity of this enzyme produces 
molecules called reactive oxygen species 
(ROS), which have been proposed to promote 
heat generation by brown fat’. The authors 
therefore suggest that succinate accumulation 
induces calorie burning by increasing the activ- 
ity of succinate dehydrogenase and so increas- 
ing ROS levels. However, it is unclear whether 
the contribution of circulating succinate to the 
TCA cycle in brown-fat cells is really sufficient 
to alter ROS levels and heat generation. 

As an alternative explanation, perhaps 
succinate triggers a yet-to-be-discovered sig- 
nalling system in brown fat. Or perhaps circu- 
lating succinate is sensed in a different part of 
the body, such as the brain, which then signals 
to brown fat to activate heat production. Defin- 
ing the mechanism at work is of more than aca- 
demic interest — it might prove important in 
determining the ideal dose and schedule for 


succinate administration in humans, or for 
identifying pharmacological alternatives to 
bulk succinate intake. Finding the molecular 
players involved will be crucial, the most obvi- 
ous missing protein being the transporter that 
carries succinate into brown fat. 

Humans, of course, differ from mice in 
many ways. One of the most obvious is our 
larger body size, which is associated with a 
lower ratio of body surface area to mass. As 
a consequence, we are better at staying warm 
than are mice, but worse at getting rid of heat. 
It is probably for these reasons that brown 
fat makes up a much lower percentage of our 
body mass*. Moreover, we lose brown fat as 
we age. This could limit the extent to which 


NEURODEVELOPMENT 


NEWS & VIEWS | RESEARCH | 


activation of metabolic processes in brown 
fat can alter calorie expenditure. Accord- 
ingly, methods to induce brown-fat proper- 
ties in existing white fat might be needed asa 
complementary approach’. It will neverthe- 
less be interesting to see whether succinate can 
induce substantial calorie burning in humans. 
Taking a step back, circulating TCA inter- 
mediates have not previously been considered 
as key factors in metabolism. But several TCA 
intermediates are present in the circulation at 
substantial levels, and some of them, such as 
citrate, flow into and out of the bloodstream 
to a greater extent than does succinate”. The 
finding that circulating succinate has a well- 
defined, and perhaps even medically impor- 
tant, metabolic role raises the possibility that 
circulating TCA intermediates will more gen- 
erally prove to be vital metabolic players. = 
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Nascent neurons need 
nature and nurture 


How genetic and environmental factors contribute to the generation of various 
subtypes of inhibitory neurons called interneurons in the brain is unclear. A 
study in mice provides new insight into this process. 


CHRISTIAN MAYER & GORD FISHELL 


he mature brain contains an enormous 
variety of locally projecting inhibitory 
neurons known as interneurons. How 
the brain’s precise complement of interneurons 
is generated during development is a subject of 
lively debate. At its heart, this question is one 
of nature versus nurture. Young interneurons 
are ‘born’ ina region called the subpallium and 
undergo a long migration to reach their final 
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positions in the brain’s cortex — but it remains 
unclear how much ofan interneuron’s mature 
fate is bestowed by its genetic identity, which 
is established when the cell stops proliferating, 
and how much is acquired through nurture 
during migration. Writing in Nature Neuro- 
science, Lim et al.’ investigate how migration 
influences cellular identity. 

There is evidence to support roles for both 
nature and nurture in defining the identities 
of the different classes, types and subtypes of 
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interneuron in the mature brain. 
Intrinsic gene-expression pro- 
grams are thought to begin to define 
interneuron identities in the embryo, 
and to unfold over a lengthy period 
of time**. The expression of cer- 
tain genes remains conserved in 
specific types of interneuron from 
their birth through to adulthood’, 
whereas others affect interneuron 
maturation more transiently dur- 
ing early embryonic development”. 
Such intrinsic processes are thought 
to cooperate with the interneuron’s 
environment to establish neuronal 
circuits and brain connectivity in the 
adult°. For example, after arrival in 
the cortex, neuronal activity affects 
several aspects of interneuronal 
development’ ”. 

The authors studied the migration 
pathways of two types of interneuron 
— one characterized in the adult by 
expression of the protein somato- 
statin, the other by expression of the 
protein parvalbumin. Both types are 
born in the same general region of 
the subpallium. Interneurons from 
this region reach the embryonic cor- 
tex predominantly by two migration 
routes”: one that takes them through 
the marginal zone above the cortex; 
and one that transits below the cor- 
tex, through the subventricular zone. 

Does the route taken by an imma- 
ture interneuron have an effect on 
the identity of the mature cell it will 
become? To find out, Lim et al. spe- 
cifically labelled somatostatin- or 
parvalbumin-expressing interneu- 
ron precursors in the marginal 
zone with a fluorescent protein, 
and observed the cells’ development. The 
authors found that both populations tend to 
develop complex projections at the end of 
their migration through the marginal zone. 
These projections, called translaminar axons, 
cross different layers of the cortex. This find- 
ing led the researchers to propose that migra- 
tion through the marginal zone influences the 
growth and branching of axons through some 
general mechanism. 

To examine this idea, Lim and colleagues 
performed time-lapse imaging experiments 
in brain slices grown in culture. Consistent 
with their hypothesis, interneurons ‘abseiled’ 
down into the cortex after travelling through 
the marginal zone. During this process, most 
of these cells anchored their nascent axon in 
the marginal zone like a trailing rope (Fig. 1). 
Thus, migration route and axonal develop- 
ment seem to be linked for these cells. 

Next, the authors investigated the conse- 
quences of deleting genes that are expressed 
in somatostatin-interneuron precursors that 
migrate through the marginal zone, but not 
in those that pass through the subventricular 
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Figure 1 | Intrinsic and environmental cues govern the 
development of interneurons. Cells called interneurons become 
progressively more diverse as they mature. Interneurons generated 

in the subpallium of the mouse brain can be divided into a group 

that will become mature cells expressing the protein somatostatin 
(red), and another that will express the protein parvalbumin (blue). 
Neurons from each group migrate to the cortex through one of two 
routes: along the marginal zone (MZ) above the developing cortex 

or along the subventricular zone (SVZ) below it. Lim et al.' report 
that an intrinsic cue — expression of the gene Mafb — leads cells 

to migrate through the MZ, and to develop long axonal projections 
when they move into the cortex, whereas cells that migrate through 
the SVZ develop short local axons. However, the migration route 
taken and development of projections also depends on environmental 
cues, often involving neuronal activity (not shown). The orientation 
of the brain is indicated in the inset, and by dotted arrows. 


zone. The group found that deletion of one 
such gene, Mafb, in these cells results in about 
a 20% decrease in the fraction of somatostatin- 
interneuron precursors migrating through 
the marginal zone. Moreover, those neurons 
that failed to migrate through the marginal 
zone lacked their characteristic translaminar 
projections. Finally, Lim et al. isolated migrat- 
ing interneurons from both routes and trans- 
planted them back to the beginning of the 
migration path in a cultured brain slice. Slightly 
more than 60% of the cells from the marginal 
zone entered the same migration path again, 
hinting that intrinsic differences between neu- 
rons influence which cells take which path. 

Taking their results together, Lim and 
colleagues conclude that, early in develop- 
ment, genetic factors determine what type 
of interneuron a cell will become, and direct 
the cell down the appropriate migratory path. 
However, there is also evidence from the 
current work for environmental effects on 
interneuron development. 

First, the effects of migration route on axonal 
branching seem to be largely independent of 
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genetic priming, because somatostatin- 
and parvalbumin-interneuron pre- 
cursors are similarly affected. Second, 
although Mafb-deficient cells that fail to 
migrate through the marginal zone lack 
their translaminar projections, they do 
retain other properties characteristic of 
cells that follow the marginal-zone route. 
Third, almost 40% of interneurons 
transplanted into brain slices from the 
marginal zone picked a different migra- 
tory route the second time round. It is 
therefore likely that stochastic processes 
are a major part of the distribution of 
interneurons between migration paths. 
A balance between predetermined and 
environmentally specified aspects of 
interneuron development seems to be 
emerging. 

One explanation for how such a bal- 
ance might work involves the expression 
of genes such as Mafb in newly born 
interneurons acting as a virtual ‘look-up 
table’ In this scenario, the expression of 
intrinsic signals might bias the response 
of developing neurons to subsequent 
environmental cues. As such, the combi- 
nation of early gene expression coupled 
with later environmental cues might 
jointly determine the cells’ final identity 
and connectivity. 

Determining the influences of envi- 
ronmental cues on particular interneu- 
ron populations requires the ability to 
selectively target those populations. 
The identification this year*” of genes 
expressed early in development that 
are specific to particular interneuron 
populations promises to provide a 
way to probe the contributions of early 
intrinsic and later environmental cues 
in particular interneuron subclasses. 
Certainly, the present paper provides strong 
evidence for how the two aspects of develop- 
ment are linked. m 
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Machine learning at the energy and 
intensity frontiers of particle physics 


Alexander Radovic!*, Mike Williams?*, David Rousseau’, Michael Kagan‘, Daniele Bonacorsi>*®, Alexander Himmel’, 


Adam Aurisano®, Kazuhiro Terao* & Taritree Wongjirad? 


Our knowledge of the fundamental particles of nature and their interactions is summarized by the standard model 
of particle physics. Advancing our understanding in this field has required experiments that operate at ever higher 
energies and intensities, which produce extremely large and information-rich data samples. The use of machine-learning 
techniques is revolutionizing how we interpret these data samples, greatly increasing the discovery potential of present 
and future experiments. Here we summarize the challenges and opportunities that come with the use of machine learning 


at the frontiers of particle physics. 


dance of experimental evidence, yet we know that it cannot be a 

complete theory of nature because, for example, it cannot incor- 
porate gravity or explain dark matter. Furthermore, many properties of 
known particles, including neutrinos and the Higgs boson, have not yet 
been determined experimentally, and the way in which the emergent 
properties of complex systems of fundamental particles arise from the 
underlying standard-model theory remains unknown. 

Many known particles were discovered using detectors that made sub- 
atomic particles visible to the human eye. For example, bubble cham- 
bers' filled with superheated liquids that boil when charged particles pass 
through them transform the paths of the particles into visible tracks of 
bubbles, which can then be photographed and analysed. The detectors at 
the Large Hadron Collider (LHC)? are much more complex and record 
data at far greater rates than is possible using bubble chambers. For exam- 
ple, the LHCb experiment? analyses as many events every six seconds 
as the Big European Bubble Chamber recorded in its entire 11 years of 
operation (1973-1983), and the datasets collected by the ATLAS* and 
CMS> experiments at the LHC are comparable to the largest industrial 
data samples. It is impossible for humans to visually inspect such large 
amounts of data; algorithms running on large computing farms took over 
this task long ago. 

Over the past two decades, particle physics has been migrating towards 
the use of machine-learning methods in the collection and analysis of its 
large data samples®. Pioneering studies that used neural networks”* and 
boosted decision trees (BDTs)"” in previous-generation experiments!" 
laid the groundwork for the emergence of machine learning as an essential 
tool at the LHC. Machine-learning algorithms made important contri- 
butions to the discovery of the Higgs boson”? and most data-analysis 
tasks now benefit from the use of machine learning. In parallel, the field 
of machine learning has developed at a rapid pace and, in particular, the 
subfield of deep learning has delivered superhuman performance in sev- 
eral domains”*”®, Incorporating these tools while maintaining scientific 
rigour required in particle-physics analyses presents new challenges. This 
Review focuses on the application and development of machine-learning 
methods at the LHC, including recent advances based on deep learning. 
In addition, we present some example applications of deep learning within 
the subfield of neutrino physics, in which state-of-the-art methods, such 
as from computer vision, are naturally applicable. 


T he standard model of particle physics is supported by an abun- 


Big data at the LHC 

The sensor arrays of the LHC experiments produce data at a rate of 
about one petabyte per second. Even after drastic data reduction by 
the custom-built electronics used to readout the sensor arrays, which 
involves zero suppression of the sparse data streams and the use of 
various custom compression algorithms, the data rates are still too 
large to store the data indefinitely—as much as 50 terabytes per second, 
resulting in as much data every hour as Facebook collects globally in a 
year’’. In this section we first motivate why it is necessary to produce 
such immense data samples, before discussing how machine learning 
is being used to more effectively select—in real time—which data to 
keep for further studies and which data to permanently discard. In 
addition, we show how the use of machine learning is leading to more 
efficient processing of these data using vast computing resources dis- 
tributed around the world. Both of these big-data challenges must be 
overcome before the LHC data can be used to advance our knowledge 
of fundamental particles. 


LHC operations 

Einstein famously related mass m to energy E via E = mc’, where c is the 
speed of light in a vacuum. A powerful particle accelerator such as the 
LHC, which is 27 km in circumference, is therefore required to create 
particles orders of magnitude more massive than the proton, such as 
the Higgs boson. A Higgs boson is produced only once every few billion 
proton-proton collisions at the LHC. Many other interesting reactions 
occur orders of magnitude less often. To enable such data samples to 
be recorded in a reasonable time frame, the LHC collides nearly one 
billion protons per second. 

High-energy collisions can produce hundreds of particles, and 
disentangling such complex events requires detectors with large and 
diverse sensor arrays. The ATLAS and CMS detectors each contain 
roughly 100 million detection elements. Most of the particles pro- 
duced in the LHC experiments decay before they can be detected 
by any of the sensors. Therefore, LHC analyses must infer what the 
underlying reactions were on the basis of the properties of the particles 
that are detected. A wide variety of sensor technologies are used in 
the LHC detectors. The various signals from the particles that are 
detected by these sensor arrays are digitized, converting the physical 
processes involving subatomic particles into large collections of bytes. 
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The extreme rate at which the LHC collides protons, along with the 
size and complexity of the LHC detectors, results in the production of 
enormous data samples. 


Real-time analysis 

The LHC experiments use data-reduction schemes executed in real 
time, referred to as triggers, to identify which data to retain for future 
analysis and which to permanently discard. For example, the ATLAS 
and CMS experiments each keep only about 1 in every 100,000 events. 
Despite this, their data samples are each still about 20 petabytes per 
year. The first step in deciding which events to keep relies on logic that 
is encoded directly into the hardware to enable the fastest possible deci- 
sions, such as into devices known as field-programmable gate arrays 
(FPGAs). Machine learning is already used in this environment; for 
example, CMS uses machine learning in its trigger hardware to better 
estimate the momentum of muons”®, with the inputs to the algorithm 
discretized to enable the machine-learning response to be encoded in 
a large look-up table that is easily programmed into the FPGAs. 

In addition, the LHC experiments use huge computing farms to pro- 
cess the extreme volumes of data and search for interesting signatures. 
In the case of the LHCb experiment, many of the reactions of greatest 
interest do not produce striking signatures in the detector, making it 
necessary to thoroughly analyse high-dimensional feature spaces in 
real time to efficiently classify events”’. Since the first year of LHCb 
data collection, the primary algorithm used for such classification has 
been machine-learning-based; specifically, a BDT was used for the 
first two years*’, which has since been replaced by a MatrixNet algo- 
rithm?!. The use of machine learning is now ubiquitous, which has 
greatly improved performance while satisfying the stringent robustness 
requirements of a system that makes irreversible decisions. Currently, 
70% of all data retained are classified by machine-learning algorithms 
and all charged-particle tracks are vetted by neural networks*”. As an 
example of the effect of these machine-learning methods, achieving the 
same sensitivity as a recent LHCb search for the dark-matter analogue 
of the photon, which was performed using data collected in 201673, 
would have required 10 years of data collection without the use of 
machine learning. 


Actionable insights from computing metadata 

Processing of the industrial-scale data samples collected by the LHC 
experiments is performed using the computing resources of the LHC 
Computing Grid, which are distributed across dozens of centres world- 
wide. The massive volumes of data moved between grid centres, and 
the large number of CPU processing jobs used to access and analyse 
these data, generate an enormous amount of metadata information 
from which actionable insights can be extracted. Machine-learning 
techniques have recently begun to play a crucial part in increasing the 
efficiency of computing-resource usage at the LHC***°. One example 
is predicting which data will be accessed the most, as currently mon- 
itored by CMS*” and LHCb*S, so that it becomes possible to optimize 
data storage at the grid centres. Another example involves monitoring 
data-transfer latencies over complex network topologies at CMS°?, 
using machine learning to identify problematic nodes and to predict 
likely congestions. Currently, machine learning informs the choices of 
the computing-operations teams, but in the future it form the basis of 
fully automatic and adaptive models. 


Machine learning as an established tool 

After identifying and recording the most interesting LHC events and 
processing them on the Computing Grid—two vital tasks supported 
by machine learning—the data are ready for exploration. The first step 
in interpreting these data involves grouping the signals recorded by 
various sensor elements according to which particle created them. 
The types and properties of the particles can then be inferred from the 
subsets of event information associated with them. Finally, after recon- 
structing all detected particles in the event, the data are analysed to 
determine the underlying physical processes that created the particles. 
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Fig. 1 | Machine learning for calorimetry at CMS. The mass distribution 
of Z bosons that decay to electron-positron pairs (Z — e*e~), as measured 
in the central part of the CMS detector and binned into 1-GeV bins, is 
shown for three cases: using only the raw information from the detector 
(orange), after clustering the data (green) and after applying the machine- 
learning-based corrections discussed in the text (blue). The true position 
of the peak for this decay is 91 GeV. Image adapted from ref. '°! under a 
CC BY 4.0 license, copyright CERN, reused with permission. 


Interpreting such complex data samples is an extremely challenging 
task, which has been revolutionized by the use of machine-learning 
techniques. About 2,000 journal articles have been produced by the 
LHC experiments to date, providing a large library of examples of the 
use of machine learning with these types of complex dataset. In this sec- 
tion we discuss a few highlights, including the role of machine learning 
in the discovery of the Higgs boson?*4, 


Determining particle properties 

The use of machine learning to improve the determination of particle 
properties is now commonplace at all of the LHC experiments. For 
example, BDTs are used to increase the resolution of the CMS electro- 
magnetic calorimeter’. When an electron or photon enters such a 
detector, it rapidly loses its energy, which is subsequently collected and 
measured by the calorimeter. This deposited energy is often recorded 
by many different sensors and the readings from these sensors must 
be clustered together to recover the original energy of the particle. 
Multivariate regression is used by CMS to train BDTs to provide cor- 
rections to these inferred energies on the basis of all of the information 
contained in each calorimeter sensor. Applying these energy correc- 
tions to the decay of a Z boson into an electron-positron pair results 
in a substantial improvement in mass resolution compared to the tra- 
ditional clustering approach (see Fig. 1). 


Discovery of the Higgs boson 

As stated above, a Higgs boson is produced only once every few billion 
proton-proton collisions at the LHC; however, the Higgs boson usually 
decays in ways that mimic much more copiously produced processes. 
The cleanest experimental signature of the Higgs boson involves its 
decay into two muon-antimuon pairs, which occurs roughly once every 
10 trillion proton-proton collisions. This and a few other processes 
were used in the Higgs discovery analyses. Most were selected owing to 
their striking experimental signatures, which made it possible to obtain 
pure signals using relatively simple analyses. An important exception 
was the analysis of the Higgs boson decaying into two photons by the 
CMS experiment. 
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Fig. 2 | Separating signal events from background in the ATLAS 
experiment. a, The BDT-score distribution for a search for the Higgs 
boson decaying to a tau-lepton pair (HW — 77), with bin widths of 0.17. 
The black circles show the score of a machine-learning algorithm known 
as a BDT for data from the ATLAS detector during the 2012 data-taking 
period, where the error bars show the Poisson statistical uncertainties. 
This BDT was trained to distinguish a Higgs signal from various non- 
Higgs backgrounds. The coloured area shows the stacked contributions 
of the different background processes: Z — t+7~ decays (blue), other 
particles decaying to a tau-lepton pair (brown) and fake tau particles 
(where at least one tau lepton is misidentified; green). The dotted red line 
shows the expected total counts assuming a Higgs-boson production rate 
identical to the standard-model expectation (j1 = 1); the solid red line 
shows the expected total counts assuming a Higgs-boson production rate 
of ys = 1.4, which is still compatible with the standard model; the hatched 
area shows the systematic uncertainty on the expected total count. The 
excess counts compared to the coloured region (mainly in the rightmost 
two bins) are attributed to the Higgs boson. b, The ratios of the data (black 
circles), expected counts for js = 1 (dashed red line), expected counts 

for 41 = 1.4 (solid red line) and uncertainty (hatched region) from a to 
the expected counts for js = 1.4 (so the solid red line is identically 1) are 
shown, along with the ratio of the expected counts excluding the Higgs 
contribution (the sum of the green, brown, blue and hatched regions in a) 
to the expected counts for js = 1.4 (black line). Image adapted from ref. 
under a CC BY 4.0 license, copyright CERN. 


The CMS analysis involved searching for a small excess of diphoton 
candidates, manifested as a narrow peak in the diphoton mass spec- 
trum, in the presence of a large smoothly distributed background. This 
background largely consisted of diphotons that originated from pro- 
cesses other than the Higgs decay and from candidates formed from 
one real photon combined with an artificial photon signal (that is, a 
photon inferred from the detector signals that did not correspond to an 
actual photon produced in the physical process). Two BDTs were used 
to improve the diphoton mass resolution by better determining which 
proton-proton collision the photons were produced in. Because both 
the standard-model Higgs process and the dominant background pro- 
cesses are well understood, it was possible to use simulated data samples 
to train a BDT. On the basis of the response of this BDT, the CMS 
diphotons were either discarded or kept for further analysis. The dipho- 
tons selected were also categorized using the BDT response, making 
it possible to analyse a rare—but highly pure—subset of Higgs decays 
separately. A simultaneous fit was performed to the mass distributions 
of all categories, which greatly enhanced the sensitivity to the presence 
of a Higgs signal. The increase in sensitivity due to the use of machine 
learning was equivalent to collecting 50% more data. 
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Determining the properties of the Higgs boson 

The standard model contains only one Higgs boson, which is the sim- 
plest explanation for the phenomenon known as electroweak symmetry 
breaking. Many extensions to the standard model predict that there are 
many Higgs bosons; for example, super-symmetric theories predict a 
rich Higgs sector, and other theories predict that the Higgs boson is 
a composite object, not a fundamental particle. The standard model 
provides precise predictions for the properties of the Higgs boson and 
it is vital that these predictions are tested experimentally to determine 
the nature of the Higgs particle discovered at the LHC. 

The Higgs-boson discovery analyses firmly established its interac- 
tions with the electroweak-force-carrying particles, namely the pho- 
ton, W boson and Z boson. The standard model also predicts that the 
Higgs boson interacts with fermions (quarks and leptons) and that the 
strength of each of these interactions is proportional to the masses of 
the fermions. This means that the Higgs boson is expected to decay into 
heavier quarks and leptons more often than into their lighter cousins. 
The ATLAS and CMS experiments have thus far observed the Higgs 
boson decaying into the heaviest kinematically accessible quark, the 
beauty quark*!*”, and into the most massive lepton (a heavier ver- 
sion of the electron known as the tau lepton). Machine learning had a 
major role in each of these discoveries, although we describe only the 
ATLAS search for the decay of the Higgs boson into an antitau-tau pair 
(H — rt7) in detail here. 

The study of tau particles is challenging because they decay before 
being detected and because their decays always involve neutrinos 
that escape detection and carry away energy. Furthermore, the decay 
Z—ttr occurs about 1,000 times more often than does H — r*7-. 
The ATLAS analysis divided the data sample into six distinct kin- 
ematic regions. A BDT was trained in each region using 12 weakly 
discriminating input features*’. In Fig. 2 we show an example BDT 
response distribution obtained in one region. The combined analysis 
of all six regions provided strong evidence for the realization of the 
Higgs boson coupling to tau leptons in nature, with about 40% better 
sensitivity achieved through the use of machine learning. Thus far, the 
interactions of the Higgs boson with quarks and leptons appear to be 
consistent with standard-model predictions. The simulation that led 
to this result was eventually released through Kaggle as the basis of 
the 2014 Higgs Machine Learning Challenge“, where data scientists 
competed to provide alternative machine-learning methods to isolate 
the H + 7*7~ signal. Table 1 shows the impact of machine learning 
on the measurement of several key processing involving Higgs bosons 
at ATLAS and CMS. 


A high-precision test of the standard model 

The standard model predicts that only three out of every billion B, 
particles—bound states that contain a beauty quark—decay into a 
muon-antimuon final state. The fact that this decay rate is so highly 
suppressed in the standard model contributes to it being extremely 
sensitive to potential quantum effects induced by as-yet-unknown 
particles, especially from an extended Higgs sector; for example, certain 
super-symmetric theories predict an order-of-magnitude enhancement 
in this decay rate. The CMS and LHCb experiments were the first to 
find evidence for this decay, using data samples collected in the first 
few years of the LHC, and a combined analysis of these datasets pro- 
duced the first observation of it’. The analyses used BDTs to reduce 
the dimensionality of the feature space—excluding the mass—to one 
dimension and then an analysis was performed of the mass spectra 
across bins of BDT response. This approach preserved as much infor- 
mation as possible about the mass spectra of both the signal and back- 
grounds, providing the best possible sensitivity to this extremely rare 
decay of the B; meson into a muon-antimuon final state. The decay 
rate observed is consistent with the standard-model prediction with 
a precision of about 25%, which places stringent constraints on many 
proposed extensions to the standard model. Finally, a more recent 
update from the LHCb experiment achieved the first single-experiment 
observation“*; achieving a similar sensitivity without the use of machine 
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Table 1 | Effect of machine learning on the discovery and study of 
the Higgs boson 


Sensitivity Sensitivity Ratio Additional 

Years of data without machine withmachine ofP data 
Analysis collection learning learning values required 
CmMs?4 2011-2012 2.20, 2.70, 40 51% 
H> yy P=0.014 P=0.0035 
ATLAS*3 = 2011-2012 2.50, 3.40, 18 85% 
H- ttre P=0.0062 P= 0.00034 
ATLAS? =. 2011-2012 1.99, 2.50, 47 73% 
VH — bb P=0.029 P=0.0062 
ATLAS*! =. 2015-2016 2.80, 3.00, 19 15% 
VH — bb P=0.0026 P=0.00135 
CMs!0° 2011-2012 1.40, 2.10, 45 125% 
VH — bb P=0.081 P=0.018 


Five key measurements of three decay modes of the Higgs boson H for which machine learning 
greatly increased the sensitivity of the LHC experiments, where V denotes a W or Z boson, 7 
denotes a photon and ba beauty quark. For each analysis, the sensitivity without and with 
machine learning is given, in terms of both the P values and the equivalent number of Gaussian 
standard deviations o. (We present only analyses that provided both machine-learning-based and 
non-machine-learning-based results; the more recent analyses report only the machine-learning- 
based results.) The increase in sensitivity achieved by using machine learning, as measured by 
the ratio of P values, ranges roughly from 2 to 20. An alternative figure of merit is the minimal 
amount of additional data that would need to be collected to reach the machine-learning-based 
sensitivity without using machine learning, which varies from 15% to 125%. 


learning would have required the collection of about four times as much 
data. This is just one of many examples of high-precision tests of the 
standard model at the LHC for which machine learning has markedly 
increased the power of the measurement. 


The emergence of deep learning 

Machine learning in particle physics, including the examples presented 
in the previous two sections, has traditionally involved the use of 
field-specific knowledge to engineer tools to extract the features of the 
data that are expected to be the most useful for a given measurement. 
This enables the incredibly rich initial data to be interpreted using 
only a small number of features. For example, in the aforementioned 
B, decay, a human-designed tracking algorithm first reconstructs the 
paths taken by the muon and the antimuon in a magnetized parti- 
cle-physics detector, and from these paths the momenta of the particles 
are inferred. However, only the dimuon mass and the angle between 
them are used in the BDT. The rest of the kinematic information is 
discarded. 

For many tasks, information can be lost when these human- 
designed tools are used to extract features that fail to fully capture the 
complexity of the problem. As in the fields of computer vision and 
natural language processing”*“”, there is a growing effort in particle 
physics to skip the feature-engineering step and instead use the full 
high-dimensional feature space to train cutting-edge machine-learning 
algorithms, such as deep neural networks“. In this approach, domain 
expertise is used to design neural-network architectures that are best 
suited to the specific problem. Studies of such applications have grown 
substantially in number and complexity within the past several years, 
beginning around 2014 with applications of deep neural networks to 
data analysis”, quickly expanding to the first applications of computer 
vision**->” and to the current broad study of deep learning throughout 
the field of particle physics**->®. 

In this section we highlight a few recent applications of two types 
of deep learning algorithm in particle physics: convolutional and 
recurrent neural networks (CNNs and RNNs, respectively)°”>*. The 
outputs of many particle-physics detectors can be viewed as images, 
and the application of computer-vision techniques is being explored in 
simplified settings by the LHC community ® and with initial studies 
on ATLAS and CMS simulations®°°’. However, such techniques are 
more naturally applicable in the area of neutrino physics, for which 
reason we focus our discussion of CNNs to neutrino experiments. 
Similarly, there are many applications of RNNs, but for brevity we 
discuss only their use for the study of high-energy beauty quarks at 
ATLAS and CMS. 
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Computer vision for neutrino experiments 

Loosely inspired by the structure of the visual cortex, CNNs use a strategy 
that decreases their sensitivity to the absolute position of elements in an 
image and that makes them more robust to noise. Deep CNNs are able 
to extract complex features from images and now outperform humans 
in certain image-classification tasks. Another strength of CNNs is their 
ability to identify objects in an image, as demonstrated for example 
by their use in self-driving cars, owing to translation-invariant feature 
learning. This translational invariance presents a challenge for the LHC 
experiments, whose detectors consist of layers of distinct detector tech- 
nologies moving out from the proton-proton collision region. These 
detectors provide rich information in the absolute reference frame of 
the detector, which is transformed into a more natural format for a 
CNN-based approach. By contrast, this characteristic of CNNs is par- 
ticularly useful for neutrino experiments, which necessarily use large 
homogeneous detectors owing to the incredibly small probability that 
a neutrino will interact within a small volume of material. A neutrino 
interaction can take place anywhere within these detectors and locating 
them is a critical part of neutrino-physics analyses. 

The detectors of the NOVA experiment® are filled with scintillating 
mineral oil, which emits light when a charged particle passes through 
it. Each NOVA event consists of two images: one taken from the top 
and the other from the side. The NOvA collaboration has developed 
a machine-learning algorithm*” composed of two parallel networks 
inspired by the GoogleNet® architecture. The NOvA CNN extracts 
features from both views simultaneously and combines them to cat- 
egorize neutrino interactions in the detector. This network, which 
improves the efficiency of selecting electron neutrinos by 40% with 
no loss in purity, has served as the event classifier in searches both for 
the appearance of electron neutrinos”’ and for a new type of particle 
called a sterile neutrino”!. 

The detector at the MicroBooNE experiment”, which contains 90 
tonnes of liquid argon, detects neutrinos sent from the booster neu- 
trino beamline at Fermilab. Each MircoBooNE event corresponds to a 
33-megapixel image that probably contains background tracks caused 
by cosmic rays. Identifying signals of neutrino interactions in such 
events, in which both the signal and background tracks vary in size 
from a few centimetres to metres, is one of the most challenging tasks 
of the experiment. MicroBooNE recently demonstrated the ability to 
detect neutrino interactions using a CNN”*. Specifically, an algorithm 
called Faster-RCNN” uses spatially sensitive information from inter- 
mediate convolution layers to predict a bounding box that contains the 
secondary particles produced in a neutrino interaction. In Fig. 3 we 
show an example output in which the network successfully localized a 
neutrino interaction with high confidence. Finally, by taking advantage 
of accelerated computing on GPUs, these CNNs can run much faster 
than the conventional algorithms used by previous neutrino experi- 
ments. This makes them ideally suited to the task of real-time image 
classification and object detection. 


t”2 


RNNs for beauty-quark identification 

The study of high-energy beauty quarks is of great interest at the LHC 
because these particles are frequently produced in the decays of Higgs 
bosons and top quarks and are predicted to be important components 
of the decays of super-symmetric and other hypothetical particles. A 
high-energy beauty quark radiates a substantial fraction of its energy in 
the form of a collimated stream of particles, called a jet, before forming 
a bound state with an antiquark or two additional quarks. This radiation 
is emitted over a distance comparable to the size of a proton, making it 
impossible to observe the emission process directly. The beauty-quark 
bound states live for only a picosecond, corresponding to millimetre- 
to centimetre-scale flight distances at the LHC, before randomly 
decaying into one of a thousand possible sets of commonly produced 
particles. Therefore, to identify jets that originate from high-energy 
beauty quarks, it is necessary to be able to determine whether parti- 
cles were produced directly in the proton-proton collision or in the 
subsequent decay of a long-lived bound state at a location displaced 
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Fig. 3 | Neutrino selection and isolation in MicroBooNE. The 
MicroBooNE event display shows a simulated neutrino interaction (inside 
the yellow box) overlaid on a cosmic-ray background image taken using 
the real detector. The yellow boxed region contains all charge depositions 
caused by secondary charged particles being produced in the simulated 
neutrino interaction. The CNN receives as input the display without the 
yellow box indicated and draws the red box, which matches the yellow box 
remarkably well and successfully captures the most interesting part of the 
neutrino interaction. Image adapted from ref. ”*, copyright Sissa Medialab, 
reused with permission of IOP Publishing. 


from the proton-proton collision. Because jets typically contain 
between 10 and 50 particles, the number of potential discriminating 
features varies on a per-jet basis. Traditional jet-identification algo- 
rithms rely on either identifying secondary production points explicitly 
from the crossing of particle trajectories, a highly challenging task, or 
compressing the information with engineered features and neglecting 
the correlations between particles when using single-particle features. 
Although such algorithms have been combined with machine learning 
for some time’”’°, machine learning could also be used to improve 
identification by using the low-level particle features within a jet. 

RNNs have proven to be extremely successful at processing long 
sequences of data, most famously acting as the core of Google’s cur- 
rent translation service*”. RNNs process sequences in such a way that 
information across the entire sequence can be accumulated and used. 
Applying an RNN to jet classification requires the particles in the jet to 
be ordered to form a sequence, such as by ranking them by how incom- 
patible they are with originating from the proton-proton collision. A 
set of features for each particle is provided to the RNN, which is trained 
to discriminate between beauty-quark jets and all other types of jet. The 
use of an RNN at the ATLAS experiment reduced the misidentifica- 
tion rate by a factor of four relative to a non-machine-learning-based 
algorithm’’. When the RNN is itself used as an input feature in the sub- 
sequent training of a BDT or neural network, the misidentification rate 
was reduced by a factor of three relative to the machine-learning result 
without the use of the RNN as an input feature”*. Similar approaches 
are also being explored at the CMS experiment”; more sophisticated 
RNN structures have been studied in a simplified setting and show 
promising results®°. 


Training and validation 

The machine-learning algorithms used in particle physics are typically 
trained using supervised learning*! and data samples for which the true 
origins, identities and properties of the particles are known a priori. 
The algorithms learn to identify patterns in the training data, mak- 
ing it possible for them to predict information about particles in data 
samples for which expert labelling of data is impossible. It is vital that 
any machine-learning tool undergoes rigorous validation and testing 
and that the uncertainty on its performance is well understood. There 
is always the possibility that some features used by a machine-learning 
algorithm are not properly modelled in the training samples, which—if 
not properly accounted for—could lead to a false discovery. Ultimately, 
we use machine-learning tools to minimize uncertainties; the validation 
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procedures discussed in this section are important for gaining confi- 
dence in the behaviour of these tools. 


Learning from simulation 

The need to understand what signals will look like in the detectors and 
what other processes can mimic the signals has led to the development 
of high-quality simulation tools. Furthermore, the standard model pro- 
vides accurate predictions of the rates and kinematic distributions of 
many of the processes that can mimic interesting signatures (referred 
to as backgrounds) and that contribute to particle-physics data sam- 
ples, providing important benchmarks for validating the simulation 
tools and understanding their uncertainties. Therefore, simulated 
data samples are often used to train the machine-learning algorithms 
because in such samples all information is known by construction. 
An important exception is that it is often possible to obtain highly 
pure background-only data samples, such as by using events collected 
under different experimental conditions, and such samples are often 
used as background samples during training. A hybrid approach is also 
possible. The MicroBooNE CNN discussed above was trained using 
simulated neutrino interactions overlaid on cosmic-ray background 
images taken with the real detector. 


Testing for bias 

The quality and robustness of all machine-learning tools are vali- 
dated using well-known reactions recorded by the experiments. One 
approach, which is used by all LHC experiments®*’, involves con- 
structing data samples in which the data are fully understood without 
the use of machine learning. For example, the LHCb experiment uses 
Ji — pt pw” decays to validate its muon-identification neural network 
( pINN)??; J/ is a copiously produced charm-anticharm bound state, 
which can be identified with 99.9% purity when basing a selection cri- 
terion on the jsNN response to either the antimuon (7) or muon (j:7). 
The identity of the other particle is therefore known without using the 
LINN, providing an unbiased data sample with which the performance 
of the «NN can be studied. Domain-specific knowledge is then used 
to transfer what is learned on these validation samples, in terms of 
both the expected performance and its uncertainty, to any analysis that 
uses that specific machine-learning algorithm. In the case of the :NN, 
the algorithm is studied in ranges of the values of muon and event- 
level properties and the response of the detector within these ranges 
is assumed to be independent of the process that produces the muon. 

Another approach involves hybrid events, whereby the data are aug- 
mented with simulations to produce a test sample that mimics a signal. 
One example used by NOvA* takes abundant and pure muon-neutrino 
charged-current data and replaces the detected muon with a simu- 
lated electron. These hybrid events allow the performance of NOvAs 
machine-learning algorithms to be studied on rare electron-neutrino 
charged-current interactions, which are expected to look identical to 
muon-neutrino charged-current interactions in the detector apart 
from the muon-to-electron swap. Similar techniques were used for 
the H — r*7~ decay by ATLAS* and CMS®. 

The approaches presented above are reminiscent of the procedures 
used to characterize the performance of complex detectors in past 
decades. Alternatively, tools developed by the machine-learning 
community can be used to probe the response of the algorithms. For 
example, t-distributed stochastic neighbour embedding (t-SNE)* is 
a non-parametric embedding technique that allows the proximity of 
points in a high-dimensional space to be visualized using only two 
dimensions. It can be used to study the groupings of different events 
according to the features extracted by a deep neural network. Events 
with overlapping extracted features, which the network interprets to be 
similar, are near each other in the t-SNE mapping; conversely, events 
with little or no overlap are far from each other in the mapping. These 
t-SNE projections are used to ensure that the groupings match the 
intuition about the physical processes being studied, to check whether 
non-training events are embedded as expected, and can even be used in 
conjunction with auto-encoder neural networks to search for anomalies 
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Fig. 4 | Exploring NOvA’s event-selection neural network using 
t-SNE. The features extracted using NOvA’s neutrino-interaction CNN 
are projected into a two-dimensional space using the t-SNE method. 
The points represent events from the CNN training sample, with the 
colours denoting the true event types: muon-neutrino (v,,) charged- 
current interactions (dark blue), electron-neutrino (v,) charged-current 
interactions (light blue), tau-neutrino (v;) charged-current interactions 
(yellow) and various neutrino (v,) neutral-current interactions (red). 


in large datasets. In Fig. 4 we provide an example of such a t-SNE using 
simulated neutrino interactions at the NOVA experiment. 


Conclusions and outlook 

Within the next decade the LHC will increase the rate at which it 
collides protons by an order of magnitude, resulting in much higher 
data rates and even more complex events to disentangle. Neutrino- 
physics detectors will continue to increase in size and complexity. The 
tasks discussed in this Review will become even more challenging. 
Fortunately, machine learning is advancing rapidly, producing tools 
that are potentially applicable to a wide array of tasks in particle physics. 
By continuing to map the challenges faced in particle physics to those 
addressed by the machine-learning community, it is possible to turn 
the latest developments in machine learning into tools for discovery in 
high-energy particle physics, such as by conducting machine-learning 
competitions with LHC benchmark datasets (https://www.kaggle. 
com/c/trackml]-particle-identification). We briefly discuss a few poten- 
tial future applications below, which have already shown promising 
results for simplified test cases. 

The machine-learning community continues to discover powerful 
methods for processing and classifying complex data with inherent 
structure, such as trees and graphs. Complex data structures are prev- 
alent at the LHC. The set of particles that make up a jet can be mapped 
to a tree structure. We have already discussed how RNNs can be used 
to identify jets that originate from beauty quarks, but this is just one 
of the many potential applications of RNNs, or of graph convolutional 
networks, to the study of jets*”. 

Generative models, which learn the probability distribution of fea- 
tures directly, are capable of producing simulated data that closely 
approximate experimental data using tools such as generative 
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The subplots show example event topologies from points in the two- 
dimensional t-SNE space, with the intensity of the colour indicating the 
amount of energy deposited and the axes denoting the spatial location of 
the charge deposits in the detector. The various types of event are clustered 
into distinct regions in the horizontal direction, while the multiplicity of 
the particles in each event is found to be correlated with the location of the 
events in the vertical direction. 


adversarial networks*® and variational auto-encoders®™”. A generative 
adversarial network uses one neural network, the ‘generator, to generate 
potential data samples using random noise as input, while a second net- 
work, the ‘adversary, penalizes the generator during training if the data 
that it generates can be distinguished from the training data. Although 
they are difficult to train, these networks can potentially generate large 
data samples much faster than can existing simulation tools, offering 
the possibility of providing the orders-of-magnitude-larger simulation 
samples that will be required by future experiments. Early work in this 
direction is encouraging”, demonstrating that accurate simulations 
ofa simplified calorimeter can be produced while achieving a marked 
decrease in the computational resources required. 

The adversarial approach can also be applied to training classifiers 
with the ability to enforce invariance to latent parameters. This repre- 
sents a new way of making classifiers robust against systematic uncer- 
tainties® and is a viable approach to avoid biasing a physical feature 
such as mass®°. Several promising alternatives are also being investi- 
gated*°’, some of which have been used for analysis at LHCb”®. All 
of these approaches share the common theme of altering the training 
of the algorithms to reduce the potential bias learned. These are only 
a few of the machine-learning developments that are revolutionizing 
data interpretation in particle physics, greatly increasing the discovery 
potential of present and future experiments. 
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Genome-centric view of carbon 
processing in thawing permafrost 


Ben J. Woodcroft!"", Caitlin M. Singleton)”, Joel A. Boyd', Paul N. Evans!, Joanne B. Emerson”’, Ahmed A. F. Zayed?, 
Robert D. Hoelzle!, Timothy O. Lamberton!, Carmody K. McCalley?, Suzanne B. Hodgkins‘, Rachel M. Wilson’, 
Samuel O. Purvine®, Carrie D. Nicora>, Changsheng Li®, Steve Frolking®, Jeffrey P. Chanton‘, Patrick M. Crill’, Scott R. Saleska®, 


Virginia I. Rich? & Gene W. Tyson!* 


As global temperatures rise, large amounts of carbon sequestered in permafrost are becoming available for microbial 
degradation. Accurate prediction of carbon gas emissions from thawing permafrost is limited by our understanding of 
these microbial communities. Here we use metagenomic sequencing of 214 samples from a permafrost thaw gradient 
to recover 1,529 metagenome-assembled genomes, including many from phyla with poor genomic representation. 
These genomes reflect the diversity of this complex ecosystem, with genus-level representatives for more than sixty 
per cent of the community. Meta-omic analysis revealed key populations involved in the degradation of organic matter, 
including bacteria whose genomes encode a previously undescribed fungal pathway for xylose degradation. Microbial and 
geochemical data highlight lineages that correlate with the production of greenhouse gases and indicate novel syntrophic 
relationships. Our findings link changing biogeochemistry to specific microbial lineages involved in carbon processing, 
and provide key information for predicting the effects of climate change on permafrost systems. 


Permafrost thaw induced by climate change is predicted to make up to 
174 Pg of near-surface carbon (less than 3 m below the surface) avail- 
able for microbial degradation by 2100'. Prediction of the magnitude 
of carbon loss as carbon dioxide (CO,) or methane (CH,) is hampered 
by our limited knowledge of microbial metabolism of organic matter 
in these environments. Genome-centric metagenomic analysis of 
microbial communities provides the necessary information to examine 
how specific lineages transform organic matter during permafrost thaw. 
However, these methods are challenged by the inherent complexity 
and spatial heterogeneity of near-surface soil communities that sup- 
port diverse functional processes*~*. Previous metagenomic studies in 
permafrost-associated soils from Alaskan tundra*® and a mineral 
soil permafrost’ recovered a small number (14-33) of metagenome- 
assembled genomes (MAGs), which represent only a fraction of the 
species present in these systems. However, the ability to recover MAGs 
from complex microbial communities is continually improving in 
parallel with advances in sequencing technology and bioinformatic 
techniques®”. 


Recovery and distribution of MAGs 

The discontinuous permafrost at Stordalen Mire in northern Sweden 
isa model Arctic peatland ecosystem for studying thaw progression’”. 
To gain an understanding of the microbial communities at this site and 
their associated carbon metabolism, soil samples were collected from 
the active layer (seasonally thawed) of three sites across a thaw gradient: 
an intact palsa (thawed to approximately 30 cm), a partially thawed bog 
(approximately 60 cm), and a fully thawed fen. Although bogs and fens 
exist in diverse landscapes, thaw-associated shifts in hydrology cause 
them to be a common feature of thawing northern peatland perma- 
frost systems (see Methods). Triplicate soil cores and biogeochemical 


measurements were taken at each site from four active layer depths 
(near surface, mid, deep and extra-deep; 1-51 cm; Fig. 1, Supplementary 
Data 1) over three growing seasons. In total, 1.7Tb of metagenomic 
sequence data were generated from 214 samples (2-165 Gb per sample), 
with supporting metatranscriptomic and metaproteomic data from 
a subset of these samples (Supplementary Data 1, 2). Metagenome 
assembly and differential coverage binning yielded 1,529 medium- 
to high-quality MAGs (more than 70% complete and less than 10% 
contaminated) from a diverse range of bacterial (1,434 genomes) and 
archaeal phyla (95 genomes; Extended Data Fig. 1a, Supplementary 
Data 3). The Stordalen Mire MAGs expand the number of genomes 
recovered from permafrost-associated soils by two orders of magnitude. 

To resolve the taxonomic distribution of the Stordalen Mire 
MAGs, phylogenetic trees were inferred from concatenated sets of 
single-copy marker genes (120 bacterial or 122 archaeal genes). The 
recovered MAGs spanned 30 phyla, including Bacteria belonging to 
Actinobacteria (385 genomes), Acidobacteria (364), Proteobacteria 
(205) and Chloroflexi (66), and Archaea from the Euryarchaeota 
(85). The Stordalen genomes substantially expand representation of 
several common soil-dwelling lineages (Extended Data Fig. 1), such as 
the ubiquitous Acidobacteria, for which genomic representation was 
increased threefold. MAGs were also recovered from many poorly char- 
acterized phyla, including 47 genomes from the bacterial candidate 
phylum Dormibacteraeota (AD3), 53 from Eremiobacteraeota (WPS- 
2), six from FCPU426 and eight archaea from the Bathyarchaeota 
(Extended Data Fig. 1, Supplementary Data 10, 11). The Stordalen 
genomes broadly represent the major groups present in the 
system (Extended Data Fig. 2a, b) as well as many lineages previously 
detected in other permafrost-associated environments®'!. On the 
basis of the diversity of ribosomal protein sequences detected in the 
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Fig. 1 | Genome-resolved view of the microbial communities at 
Stordalen Mire. a, Schematic of the permafrost thaw gradient. Permafrost 
is light brown, active layer is blue (saturated peat) and dark brown 
(non-saturated peat). b, Community profile derived from MAG 
abundances (rows) across the active layer metagenomes (columns) from 
palsa (brown), bog (green) and fen (blue) samples. Black lines divide sites 


metagenomes, we conservatively estimate that more than 24,000 strains 
inhabit Stordalen Mire (Supplementary Note 1). The Stordalen MAGs 
represent about 60% of microorganisms in the mire at the genus level 
(Supplementary Note 2), making this, to our knowledge, the most 
comprehensive recovery of genomes from a complex, natural soil 
environment to date. 

Stordalen genomes were explicitly linked to the changing 
habitats by mapping their abundances across the thaw gradient 
(Fig. 1, Supplementary Data 4). Communities shifted substan- 
tially between sites (Extended Data Fig. 2c), with MAGs belong- 
ing to the Acidobacteria, Actinobacteria, Eremiobacteraeota, 
Alphaproteobacteria and Gammaproteobacteria predominant in the 
palsa and bog (5-41% of the community), whereas Deltaproteobacteria, 
Bacteroidetes, Chloroflexi and Ignavibacteriae were almost exclusively 
observed in the fen (8-14%; Extended Data Fig. 2e, f). The extra-deep 
bog samples had the lowest diversity (Shannon index 3.74 + 0.24; 
Fig. 1), potentially owing to the ombrotrophic and anaerobic conditions 
in this environment, whereas the shallow samples from the minero- 
trophic fen were the most diverse (Shannon index 4.55 + 0.05; Fig. 1). 
The fen also had 2.6 times more microbial cells per gram of soil relative 
to the palsa and bog (Extended Data Fig. 2d). In the bog, decreasing 
oxygen availability with depth is likely to drive changes in community 
structure, with Euryarchaeota and Dormibacteraeota increasing in 
relative abundance with depth, and Eremiobacteraeota decreasing. In 
the fen, the Planctomycetes, Omnitrophica and Spirochaetes increased 
in abundance with depth (Extended Data Fig. 2e, f). Consistent with the 
heterogeneity of soil environments”, individual MAGs were typically 
found only at high abundance (over 1%) in a limited number of samples 
(fewer than 4). However, a small number of MAGs belonging to the 
Acidobacteria, Actinobacteria, Proteobacteria and Euryarchaeota were 
ubiquitous at specific depths within the palsa, bog or fen (Extended 
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and depth (E, extra-deep). Red lines separate samples taken above (left) 
and below (right) the water table. Numbers in parentheses show total 
MAGs recovered and phylogenetic gain of Stordalen MAGs compared to 
publicly available genomes for each phylum. Red text indicates previously 
poorly represented phyla. c, Shannon diversity of each sample (filled 
circles) or averages for the sample’s thaw stage and depth (open circles). 


Data Fig. 3a, b; Extended Data Table 1). For several genera, closely 
related MAGs were abundant at different depths (Extended Data 
Fig. 3c), reflecting fine-scale adaptation to distinct niches in the soil 
column. 


Polysaccharide degradation 

Metabolic reconstruction of the MAGs, combined with 24 metatran- 
scriptome and 16 metaproteome datasets (Extended Data Table 2), 
allowed examination of the key microorganisms, pathways and inter- 
actions responsible for organic matter degradation and the produc- 
tion of greenhouse gases at Stordalen Mire (Fig. 2; Supplementary 
Data 1). The first stage in degradation involves the breakdown of high- 
molecular-weight plant-derived polysaccharides, primarily cellulose 
and hemicellulose, which make up a large proportion of peatland 
carbon‘, The ability to degrade these polysaccharides was a dominant 
feature of the Stordalen MAGs across all three thaw environments 
(Supplementary Data 5; Fig. 2 ‘MAG abundances’ and ‘distribution 
box plots’), with many encoding cellulases and xylanases (39% and 37%, 
respectively; average 3.8 and 2.6 copies per genome). This is consistent 
with gene-centric metagenomic studies of Arctic fens and tundras'>'°; 
however, the genome-centric approach used here links these metabolic 
functions to specific populations. 

Cellulase- and xylanase-encoding microorganisms, primarily 
belonging to the Acidobacteria (Fig. 2 ‘MAG abundances’), were 
most abundant in the palsa surface (68% and 59% of the recovered 
community, respectively), and decreased with depth (Fig. 2 ‘distribution 
box plots’). The surface bog had the lowest percentage of micro- 
organisms encoding these genes (34% and 24%, respectively), probably 
owing to breakdown inhibition through the production of acids by 
Sphagnum moss’’. The high relative abundance of cellulase- and 
xylanase-encoding Acidobacteria (61% and 75% of acidobacterial 
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genomes, respectively), strongly suggests that they are the primary 
degraders of large polysaccharides in the palsa and bog (Figs. 1, 
2). Metatranscriptomic data confirmed these genomic inferences, 
with Acidobacteria producing the majority of cellulase and xylanase 
transcripts at these sites (Extended Data Fig. 4). Metaproteomic analysis 
revealed protein expression for 45 cellulases and 27 xylanases primar- 
ily belonging to Acidobacteria in the bog (Supplementary Data 2). A 
wider range of microorganisms are responsible for this functionality 
in the fen, including members of the Proteobacteria, Ignavibacteriae, 
Bacteroidetes, Verrucomicrobia, Chloroflexi, and Actinobacteria. 
However, the metatranscriptomic data indicate that the Proteobacteria, 
Ignavibacteriae and Bacteroidetes have the highest expression of these 
genes in the fen, although only a limited number of proteins were 


the sites. Line thickness connecting the intermediates represents the 
average relative transcript expression (‘pathway expression’) of pathway 
genes as transcripts per million reads mapped (TPM). Lines denote 
whether proteins were detected (solid) or not detected (dashed) in the 
metaproteomes. Relative abundances between sites were found to be 
significantly different for all pathways shown (see Methods). Coloured 
stars indicate relative abundances (ANOVA, P< 0.05) of pathways that are 
significantly different between depths. 


detected. Notably, most putative cellulose hydrolysers also encode 
a xylanase (59% of genomes), with the exception of actinobacterial 
hydrolysers, which typically encode only cellulases (87%). However, 
unlike findings in other Arctic systems'®, both metatranscriptomic 
and metaproteomic data show that the actinobacterial cellulases are 
not highly expressed, indicating that these microorganisms play a 
minor role in polysaccharide degradation at Stordalen Mire. The high 
abundance of hydrolysers in the palsa suggests that the microbial com- 
munity contributes to physical compaction through decomposition of 
surface organic matter, as evidenced by increases in bulk density with 
depth (Supplementary Note 3), and that this contribution is likely to 
augment thawing of underlying permafrost as the primary driver of 
subsidence. 
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Breakdown of polysaccharides into simple sugars is the primary 
source of energy and carbon for the microbial community’®. 
B-Glucosidases for disaccharide degradation were encoded by the 
majority of microorganisms in all sites (75%, 84% and 66% of palsa, 
bog and fen communities, respectively), with transcript expression pri- 
marily by Acidobacteria in the palsa and bog, and Bacteroidetes in the 
fen (Extended Data Fig. 4). The metaproteomes supported the high 
expression of 8-glucosidase proteins by Acidobacteria in the bog (198 
out of a total of 216 detected proteins). Degradation pathways for the 
monosaccharides glucose, galactose and xylose were also prevalent in 
the MAGs (Supplementary Note 4, Extended Data Fig. 5). Of the 237 
microorganisms potentially capable of xylan degradation, 108 appear to 
be involved in xylose degradation using the canonical bacterial isomer- 
ase pathway’? (Extended Data Fig. 6a). These genomes were common 
in the surface palsa (40% of the microbial community), deep bog (49%) 
and throughout the fen (51%), similar to the distribution of micro- 
organisms capable of degrading the precursor xylan. Transcription of 
this pathway in the fen was highest by Bacteroidetes and Ignavibacteriae 
(Fig. 2 ‘pathway expression’; Extended Data Fig. 6). Members of the 
Acidobacteria, Actinobacteria and Verrucomicrobia showed highest 
transcription of this pathway in the bog, whereas expression was 
limited to Actinobacteria in the palsa (Extended Data Fig. 6c-h). 
Notably, 50 actinobacterial MAGs encoded genes necessary for xylose 
degradation, despite 44 being unable to degrade the precursor xylan, 
indicating that they are reliant on the activity of xylan hydrolysers. 

Only a small fraction of the dominant acidobacterial xylan hydro- 
lysers encode the necessary genes for the canonical isomerase pathway 
for xylose degradation (30 out of 111 genomes). Few acidobacterial 
genomes (23) encoded the alternative xylonate dehydratase pathway, 
indicating that they might degrade xylose through a membrane-bound 
glucose dehydrogenase, as previously observed only in Gluconobacter 
oxydans*° (Supplementary Note 5). A closer inspection of acido- 
bacterial xylan-hydrolysing MAGs revealed an oxidoreductase pathway 
for the conversion of xylose into xylulose, previously identified only 
in fungi?!-*3 (37 genomes; Supplementary Note 5). MAGs belonging 
to the Actinobacteria and Chloroflexi also encoded this pathway, 
together comprising 13% of the community across the thaw gradient. 
Acidobacterial and actinobacterial genes for the oxidoreductase pathway 
were expressed in metatranscriptomes from across the mire, and were 
more highly expressed than the canonical isomerase pathway in both 
the palsa and bog (Fig. 2 ‘pathway expression, Extended Data Fig. 6c-h). 
Nine MAGs expressed proteins for this pathway, primarily in the bog, 
confirming that this novel pathway is in use and is likely to account for 
a substantial fraction of xylose degradation at the mire (Supplementary 
Data 2). The detection and expression of several distinct pathways for 
xylose degradation, often occurring in the same genome (Extended Data 
Fig. 6b), reveals that one or multiple pathways may be active under 
specific environmental conditions (Supplementary Note 5). 


Fermentation 

In the anaerobic layers of the peat column, where inorganic terminal 
electron acceptors (TEAS) are rare**”*, fermentation and acetogenesis 
are essential pathways for the further degradation of monosaccharides, 
and supply the substrates for methanogenesis. Fermentation produces 
low-molecular-weight alcohols and organic acids such as ethanol, pro- 
pionate, acetate and lactate, as well as hydrogen and carbon dioxide?*®, 
In the palsa and bog, lactate fermentation is a common metabolism 
encoded by actinobacterial and acidobacterial MAGs (Fig. 2 ‘MAG 
abundances’), which are particularly abundant in the bog surface (36% 
and 16% of the community, respectively), but decrease with depth 
(Fig. 2 ‘distribution box plots’). Transcript expression of this pathway, 
while low across all sites, appears to be mostly limited to these lineages 
in the bog (Extended Data Fig. 7). Conversely, populations belonging 
to the Chloroflexi, Ignavibacteriae, Bacteriodetes, and Proteobacteria 
appear to be the primary lactate metabolizers in the fen (9%, 7%, 5% 
and 5%, respectively), with Ignavibacteriae the most transcriptionally 
active (Extended Data Fig. 7). A small fraction of Stordalen genomes 
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are capable of ethanol and propionate fermentation (Fig. 2 ‘MAG 
abundances’), and expression of these pathways is low and mainly 
limited to the palsa and bog (Extended Data Fig. 7). The abundance 
of these microorganisms in the palsa suggests that they are potentially 
important fermenters during the early stages of thaw (Fig. 2 ‘distribu- 
tion box plots’). Acetogens were most abundant in the fen across all 
depths, which suggests increased acetate production and is consistent 
with a preference for pH-neutral environments”*”> (Fig. 2 ‘distribution 
box plots’). Fen acetogens belong to the Ignavibacteriae, Bacteroidetes 
and Verrucomicrobia, whereas in the palsa and bog this metabolism 
was limited to a small number of Acidobacteria, Actinobacteria and 
Verrucomicrobia (Fig. 2 “MAG abundances’). These distributions were 
also observed in the metatranscriptomes (Extended Data Fig. 7), with 
Bacteroidetes, Ignavibacteriae and Proteobacteria contributing to the 
slightly higher expression of acetogenesis transcripts in the fen, com- 
pared to expression by Acidobacteria and Verrucomicrobia in the bog 
(Extended Data Fig. 7). While acetate can be oxidized using available 
inorganic TEAs?’ (for example, sulphate or nitrate), these are at very 
low concentrations in Stordalen Mire (Supplementary Data 6). The 
unexpectedly high ratio of CO to CH, produced at the site?’ (16:1 in 
the bog and 7:1 in the fen; Extended Data Fig. 8a) may signal the oxi- 
dation of fermentation products including acetate using organic TEAs 
such as humic substances?”??*°, 


Methane metabolism 

Methanogenesis is the final step in anaerobic carbon transformation and 
is of critical concern in thawing permafrost peatland systems where CH, 
release is increasing rapidly*!. Of the 95 archaeal genomes recovered 
(Extended Data Fig. 1), 76 were identified as hydrogenotrophic 
methanogens (H2- and CO}-utilizing), which alongside high transcript 
and protein expression (Extended Data Fig. 8d, e) suggests that this is 
the dominant form of CH, generation at the mire. Hydrogenotrophic 
methanogens increased in abundance, diversity and activity as thaw 
progressed from bog to fen (Fig. 2‘MAG abundances’), consistent with 
the increase in CH, flux!®, with mid- and deep-fen samples having 
the highest relative abundance of these methanogens (Fig. 2 ‘distri- 
bution box plots’). Only six low-abundance acetoclastic methanogens 
were recovered, primarily from the fen, where acetogenesis was also 
prevalent (Fig. 2 ‘MAG abundances’). In addition, two H-dependent 
methylotrophic methanogens from the order Methanomassiliicoccales 
were recovered, but were present at very low abundance (0.1% in the 
fen) with low transcriptional activity, making it unlikely that they 
contribute substantially to CH, production at the mire (Extended 
Data Fig. 8d). Methanotrophs from the Alphaproteobacteria and 
Gammaproteobacteria were identified across the thaw gradient. 
High abundances in the bog suggest that methanotrophs may oxidize 
substantial proportions of CHy, limiting emissions to the atmosphere 
(Supplementary Note 6). 


Microbial and geochemical interactions 

The activity of methanogens and methanotrophs alters the !3C/'?C iso- 
topic ratio of CH, dissolved in the porewater*”. A previous 16S rRNA 
gene amplicon survey at Stordalen Mire revealed that the abundance of 
Candidatus (Ca.) ‘Methanoflorens stordalenmirensis’ was the best single 
predictor of carbon isotopic fractionation during CH, production at 
the bog in 2011734, The recovery of 51 additional genomes here greatly 
expands the representation of the order Ca. ‘Methanoflorentales, and 
revealed the presence of two habitat-specific clades derived from the 
bog and fen, respectively (80-85% average amino acid identity (AAI); 
Ca. ‘M. stordalenmirensis from the bog and Ca. ‘Methanoflorens 
crilli? from the fen). The 16S rRNA gene-based correlation of Ca. ‘M. 
stordalenmirensis’ to the isotopic signature of CH4 was confirmed by the 
metagenomic data for both 2011 and 2012, with the relative abundance 
of 19 Ca. ‘M. stordalenmirensis’ MAGs in the bog explaining more 
variation than bulk environmental variables (2011 R? = 0.43, 
P=6 x 10~*; 2012 R*=0.48, P=2 x 10~*; Extended Data Fig. 8c, e). 
Notably, the relative abundance of a previously uncharacterized 
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Fig. 3 | Ca. ‘Acidiflorens’ geochemical correlations and metabolic 
reconstruction. a, Correlation of relative abundances of Ca. ‘Acidiflorens 
stordalenmirensis’ with 6'°C of porewater CH, in bog sites. b, Correlation 
of the relative abundances of Ca. ‘Methanoflorens stordalenmirensis’ and 
Ca. ‘Acidiflorens stordalenmirensis’ in bog samples. c, Correlation of the 
relative abundances of Ca. ‘Methanoflorens crillii? and Ca. ‘Acidiflorens’ 


acidobacterial population, Ca. ‘Acidiflorens stordalenmirensis, was sig- 
nificantly correlated with the isotopic composition of CH, (R? = 0.40, 
P=2 x 107°), and even more strongly correlated with the relative abun- 
dance of Ca. ‘M. stordalenmirensis’ (R? =0.82, P<2 x 107!° in bog 
sites; Fig. 3, Supplementary Note 7). 

A detailed metabolic analysis of Ca. ‘A. stordalenmirensis’ and the 
49 other genomes belonging to the genus Ca. ‘Acidiflorens’ revealed 
metabolic capabilities that suggest that members of this lineage are 
facultative syntrophs (Extended Data Fig. 1b). Members of Ca. 
‘Acidiflorens’ contain genes for the fermentation of a wide range of 
substrates including xylan, fatty acids, oxalate and fructose, and encode 
numerous hydrogenases, indicating the potential for Hz production 
and consumption (Fig. 3d, Supplementary Note 8). We hypothesize 
that the correlation between Ca. ‘A. stordalenmirensis’ and Ca. ‘M. 
stordalenmirensis’ and the CH, isotopic composition indicates that 
these lineages are in a syntrophic relationship based on inter-species 
hydrogen transfer. Hydrogen consumption by Ca. “M. stordalenmiren- 
sis’ is likely to lower the hydrogen partial pressure and thereby make 
fermentation more thermodynamically favourable for Ca. ‘A. stord- 
alenmirensis’*>”* (Fig. 3d). This syntrophy is also observed in the fen 
sites between closely related populations, as the relative abundances of 
a Ca. ‘Methanoflorens’ species (Ca. ‘M. crillii’; see above) and a second 
Ca. ‘Acidiflorens’ species (Ca. ‘Acidiflorens’ sp. 2), both of which are 
detected only in the fen, were also correlated (R?=0.19, P=1x 107%; 
Fig. 3c). The species-level resolution afforded by the genome-centric 
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sp. 2 in fen samples. d, Metabolic reconstruction of Ca. ‘Acidiflorens. 
Differential gene presence for the two lineages is indicated by colour- 
coding (Venn diagram). Functional units shown in grey are absent. Dotted 
lines indicate enzyme-independent hydrogen movement. Purple cell 
cartoon shows Ca. ‘Methanoflorens’ spp. consuming hydrogen produced 
by Ca. ‘Acidiflorens’ spp. 


metagenomic approach allowed identification of potential interactions 
between microorganisms and biogeochemistry that would have been 
missed using traditional gene amplicon surveys. 

Another key but poorly constrained biogeochemical parameter 
in global CHy models*’ is the percentage of carbon mineralized to 
CO, versus CH. We directly examined the relationship of microbial 
lineages in the bog with the porewater CH4:CO; ratio and identified 
a significant positive correlation with a genus within the candidate 
phylum Dormibacteraeota*’, named here Ca. ‘Changshengia 
(R? =0.19, P=0.001; Extended Data Fig. 9a, c). Ca. ‘Changshengia’ 
was found to oxidize glycerol, an important cryoprotectant in this 
Arctic environment*”“®, and its derivatives glycerol 3-phosphate and 
dihydroxyacetone. On the basis of the transcript and protein expression 
of genes for glycerol oxidation (Extended Data Fig. 9b, Supplementary 
Note 9), it is possible that Ca. ‘Changshengia ferments glycerol lead- 
ing to the production*! of Hz», which is transferred to methanogens, 
increasing the CH, to CO; ratio in the porewater. 


Conclusion 

Here, genome-centric metagenomic analysis of a permafrost 
thaw gradient allowed the recovery of 1,529 MAGs, substantially 
increasing the number of genomes sequenced from permafrost- 
associated environments. Analysis of these genomes and their 
abundances and expression enabled us to identify correlations between 
specific microbial populations and biogeochemistry, and revealed 
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key populations that drive the mineralization of organic matter from 
plant-derived polysaccharides through to simple sugars, and the 
greenhouse gases CO, and CHy. Future efforts that combine 
genome-centric meta-omic data with metabolomics and biogeochemical 
data will further improve our understanding of large-scale complex 
global processes, and inform Earth-system models for accurate 
predictions of climate-induced change. 
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Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
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METHODS 

Study site. As described previously*4, Stordalen Mire is a peatland in northern 
Sweden (68 22’ N, 19 03’ E), 10 km southeast of Abisko. The three sub-habitats 
of the study site are common to northern wetlands, and together cover ~98% 
of Stordalen Mire’s non-lake surface!’. These proceed from well-drained palsas 
underlain by permafrost, and dominated by ericaceous and woody plants, to 
intermediate-thaw bogs with variable water table depth, dominated by Sphagnum 
spp. mosses, to fully thawed and inundated fens dominated by sedges such as 
Eriophorum angustifolium. A thaw-associated shift in these habitats was docu- 
mented between 1970 and 2000, as palsa collapsed and bogs and fens expanded 
by 3% and 54%, respectively!°. These habitats exist in an intermingled mosaic, as 
is common in discontinuous permafrost zones, and the specific palsa, bog and 
fen that were sampled in this study are directly adjacent such that all cores were 
collected within a 120m radius. 

This formation of wetlands after permafrost thaw is a widespread characteristic 
of peatlands affected by permafrost loss!?“°. As frozen ground thaws it collapses, 
forming bogs and fens. Where this subsidence increases hydrologic connectivity, as at 
Stordalen, it can create a progression from ombrotrophic bogs to minerotrophic fens. 
A similar successional shift from bogs dominated by Sphagnum spp. to tall sedge fens 
has been observed in other northern peatlands**“*“*4”, The uncertainty surrounding 
the extent and characteristics of wetland formation from permafrost thaw is a critical 
limitation to modelling and understanding carbon-climate feedbacks"*”. Improved 
characterization of post-thaw microbial communities and carbon transformation 
processes, as advanced in this study, can directly address this uncertainty. 
Geochemistry. Across the thaw chronosequence, porewater CH, and CO, meas- 
urements and their !3C isotopic composition were sampled as described previ- 
ously’, The 6!3C-CH, is affected by the §'C-COp, because of the use or production 
of CO, during CH, generation, so the isotopic fractionation factor is used to report 
the isotopic separation of CH4 and carbon dioxide*’. The ac value reports the 
effective fractionation of C in CHy, as the §°C-CH, relative to source material 
represented by 5!3C-CO>. The effective fractionation factor of carbon in the 
porewater CHy relative to CO (ac) was calculated as described previously**°°. 


_ 6?C—~CO, + 1,000 
6°C — CH, + 1,000 


c 


DNA extraction and metagenome sequencing. DNA extractions were under- 
taken as described previously*4, with additional extractions from samples taken 
in 2012. Metagenome sequencing was performed for 2011 and 2012 using 100ng 
of the DNA in TruSeq Nano (Illumina) library preparation. For low concentration 
DNA samples, libraries were created using 1 ng of DNA with the Nextera XT DNA 
Sample Preparation Kit (Illumina). 2012 libraries were sequenced on 1/12th of an 
Illumina HiSeq2000 lane producing 100 bp paired-end reads, although some 2012 
and 2011 samples were selected for deeper sequencing. Libraries from 2011 were 
sequenced 1/24th of an Illumina NextSeq, producing 150 bp paired end reads. See 
Supplementary Data 1 for details of sequencing depth per sample. 

Quantitative real-time PCR. A quantitative polymerase chain reaction (qPCR) 
analysis was performed on selected samples to quantify microbial load. After 
pre-diluting 1/100, PCR was set up using 5,11 of 2x SYBR Green/AmpliTaq 
Gold DNA Polymerase mix (Life Technologies, Applied Biosystems), 411 of 
microbial template DNA and 1 1l of primer mix. The 16S 1406F/1525R primer 
set (0.4j1M) was designed to amplify bacterial and archaeal 16S rRNA genes: 
F - GYACWCACCGCCCGT and R - AAGGAGGTGWTCCARCC. The rpsL F/R 
primer set (0.2}1M), used for inhibition control, amplifies Escherichia coli DH10B 
only: F- GIAAAGTATGCCGTGTTCGT and R - AGCCTGCTTACGGTCTTTA. 
Three dilutions, 1/100, 1/500 and 1/1,000 (microbial template DNA, 16S 
1406F/1525R primer set) as well as an inhibition control (E. coli DH10B genomic 
DNA, rpsL primer set), were run in triplicate for each sample. The PCR was run 
on the ViiA7 platform (Applied Biosystems) including a cycle of 10 min at 95°C 
(AmpliTagq activation) and 40 cycles of (15s at 95°C followed by 20s at 55°C and 
30s at 72°C). A melt curve was produced by running a cycle of 2min at 95°C 
and a last cycle of 15s at 60°C. The cycle threshold (C,) values were recorded and 
analysed using ViiA7 v1.2 software. 

CopyRighter?! v0.46 was applied to qPCR counts to correct for 16S copy number 
variation. CopyRighter normalizes the relative abundances across OTUs for each 
sample after dividing by the estimated copy number in a pre-computed table. 
The OTU genomic abundance is then obtained by multiplying by the total abun- 
dance number. A new CopyRighter database table was generated for the 2013 
GreenGenes taxonomy (Supplementary Data 8), with copy number estimates 
for leaf OTUs as the average copy number of IMG version 4.1 genomes mapped 
to GreenGenes genomes and clustered at 99% sequence identity, and for higher 
taxonomic levels inferred copy numbers for the clade common ancestor. The 
inferred copy numbers for higher taxonomies were propagated to descendent 
lineages without known copy numbers. 
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SingleM. To determine microbiome diversity and community structure, SingleM 
was applied to reads from each sample (B.J.W. et al., unpublished materials, source 
code available at https://github.com/wwood/singlem). 

Diversity calculations. Shannon diversity was calculated based on SingleM 
counts, rarefying to 100 sequences per marker gene when >100 sequences were 
detected and excluding samples otherwise. Vegan*? was used to calculate the diver- 
sity given the rarefied SingleM OTU table across each of the 15 marker genes, and 
the average was plotted in Fig. 1. 

Genome assembly and binning. Each sample's reads were assembled individually 
using CLC Genomics Workbench version 4.4 (CLC Genomics) with an estimated 
insert size of 50-500, generating 214 assemblies. Differential coverage binning 
was undertaken by mapping all reads from each sample of site (palsa, bog or fen) 
to all assemblies of that site, using BamM ‘make’ (M. Imelfort, unpublished mate- 
rials, http://ecogenomics.github.io/BamM/) version 1.3.8-1.5.0, BWA 0.7.12%4, 
samtools”°, and GNU parallel**. Each sample's scaffolds were then binned using 
MetaBAT 3127e20aa4e7~’ using the sample’s contigs and each of the BAM files as 
points of differential coverage. 

The CheckM* v1.0.4 ‘lineage_wf’ pipeline was used to determine completeness 
and contamination of the MAG bins through the identification and quantifica- 
tion of single-copy marker genes, making use of pplacer 1.1 alpha 16°”. Genomes 
estimated to be more than 70% complete and less than 10% contaminated were 
designated the ‘Stordalen MAGs. 

MAG dereplication and taxonomic classification. When calculating relative 
abundance, to alleviate multi-mapping issues, genomes were first dereplicated at 
97% average nucleotide identity (ANI). First, amino acid identity was calculated 
between all genomes using the CompareM (v0.0.17) AAI workflow (‘comparem 
aai_wf? D. H. Parks, unpublished materials, https://github.com/dparks1134/ 
CompareM). Genomes with an AAI of >95% were compared with each other 
using ‘calculate_ani.py’ (L. Pritchard, unpublished materials, https://github.com/ 
widdowquinn/scripts). Genomes with >97% ANI over >70% alignment were 
clustered together using single-linkage clustering, and the genome with highest 
quality in each cluster was chosen as the representative, where quality was cal- 
culated as ‘completeness - 4 x contamination, as estimated by CheckM above. 
The cluster representative for each recovered MAG is provided in Supplementary 
Data 3. The CompareM AAI workflow was also used to determine average amino 
acid identities between cluster representatives to determine the specific clades 
(Supplementary Note 7). 

Genome tree and phylogenetic inference of 1,529 population bins. Phylogenetic 
inference was conducted in order to classify the MAG bins and used an in-house 
pipeline described in detail previously, the genome taxonomy database (GITDB 
v2.1.15)*. In brief, sets of 122 archaeal and 120 bacterial specific single-copy 
marker genes were used to infer domain-specific trees incorporating the 1,529 
MAGs, a reference set of genomes from NCBI (RefSeq®! release 80), and the 
recently published UBA genomes™. The concatenated alignment of these marker 
genes was created using HMMER v3.1.b2, and used as a basis for Fast Tree v2.1.9 
tree building under the WAG +GAMMA model and using the approximately 
maximum likelihood method. This tree was then bootstrapped using genometreetk 
v0.0.35 (D. H. Parks, unpublished materials, https://github.com/dparks1134/ 
GenomeTreeTk), calculating bootstrap support from 100 FastTree iterations. 
The associated taxonomy was derived using NCBI annotations, and was used to 
decorate the tree using tax2tree™ and adjusted manually. Trees were visualized 
in ARB v6.0.6, and exported into ITOL® for further refinements before final 
editing in Inkscape. For the overall Bacteria and Archaea tree the dereplicated 
set of 647 genomes were selected in ARB and exported for viewing in ITOL. For 
the Acidobacteria tree (Extended Data Fig. 1b), Aminicenantes, including two 
Stordalen MAGs, and the recently reported Ca. ‘Fischerbacteria® were included 
as likely classes within the Acidobacteria based on GTDB analysis (http://gtdb. 
ecogenomic.org/). The bootstrapped Newick trees for the overall Bacteria and 
Archaea trees are found in Supplementary Data 10 and 11, using the alignments 
from Supplementary Data 12 and 13. Phylogenetic gain (Fig. 1, Extended Data 
Fig. 1) was calculated using genometreetk pd_clade, and based on the added 
phylogenetic distances introduced to current phyla (comprising RefSeq release 80 
and UBA genomes) by including the 1,529 Stordalen MAGs. 

Calculation of relative abundance. To calculate the relative abundance of each 
genome in each lineage, reads from each sample were mapped to the set of 
dereplicated genomes using BamM ‘make’. Low quality mappings were removed 
with BamM v1.7.3 ‘filter’ (minimum identity 95%, minimum aligned length 
75% of each read) and the coverage of each contig calculated with BamM ‘parse’ 
using ‘tpmean’ mode, so calculating the coverage as the mean of the number of 
reads aligned to each position, after removing the highest 10% and lowest 10% of 
positions. The coverage of each MAG was calculated as the average of contig 
coverages, weighting each contig by its length in base pairs. The relative abundance 
of each lineage in each sample was calculated as its coverage divided by the total 
coverage of all genomes in the dereplicated set. 
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Genomes that were differentially abundant by depth. To determine which 
lineages were differentially abundant between surface and deep samples, the set of 
relative abundances from each surface sample was compared to the set of relative 
abundances in the deep samples. The mean and statistical significance of the 
difference was calculated use R v3.3.2°”. To determine the average amino acid 
identity between pairs of samples, the ‘aai_wf’ of CompareM v0.0.7 (D. H. Parks, 
unpublished materials, https://github.com/dparks1134/CompareM) was used 
using the protein sequences predicted by Prokka as input. 

Annotation. Gene calling and preliminary annotation were undertaken with 
Prokka 1.11%. The genome was either annotated as Archaea or Bacteria, based in 
an inferred domain derived from the genome tree detailed above. 

Annotation of glycoside hydrolase genes. All proteins predicted from all recovered 
genomes were screened using HMMSEARCH® using the dbCAN HMMs v5”, 
using default parameters, then results post-processed to remove hits with e > 1 x 
107!8 and HMM coverage of <0.35, where coverage was calculated as (hmm_ 
to - hmm_from/qlen). Any genes with a hit passing these thresholds was then 
mapped to an EC number”! using DIAMOND v0.8.27.897, with a database of all 
genes annotated with a fully defined (four number) E.C. number. This database of 
E.C. annotated genes was generated by gathering a list of GenBank identifiers of 
all characterized genes from each CAZy webpage”? (listed at http://www.cazy.org/ 
Glycoside-Hydrolases.html) using a custom Ruby script and then downloading 
the corresponding protein sequences from GenBank. 

Annotation of carbon metabolism. Annotation was undertaken using in-house 
scripts, which assign KEGG orthology to each gene via HMMs, taking the best 
hit and requiring an e <1 x 10°°. Encoding of whole pathways was inferred from 
genomes through the application of KEGG modules, both those available from 
KEGG as well as a number of custom modules (Supplementary Data 9). 
Etymology. Description of ‘Candidatus Methanoflorens crillii’ sp. nov. 

“Candidatus Methanoflorens crillii’ [‘cril.lii. N. L. gen. n. ‘crilli’, named after 
Patrick Crill, Stockholm University, Sweden, in recognition of his work on under- 
standing of biogeochemical processes at the landscape scale (thawing permafrost) 
including greenhouse gases emission under the impact of climate change]. 

Candidatus Methanoflorens crillii sp. nov. is the second species recognized in 
the genus ‘Candidatus Methanoflorens’ The description is as provided by Mondav 
et al. (2014) for the genus with the following additional properties. The species can 
be differentiated from the recognized ‘Ca. M. stordalenmirensis’ on the basis of 
phylogenetic analyses showing them to be monophyletic and sufficiently distinct 
average amino acid identity between encoded proteins. 

Description of ‘Candidatus Acidiflorens stordalenmirensis’ gen. et sp. nov. 

‘Candidatus Acidiflorens stordalenmirensis (A.ci.di-florens. N.L. n. acidum 
(from L. adj. acidus, sour), an acid; N.L. masc. substantive from L. masc. part. adj. 
florens, flourishing, to bloom; N. L. masc. n. Acidiflorens, an organism that blooms 
in acid; stor.da.len.mi-ren ‘sis. N.L. masc. adj. ‘stordalenmirensis, of or belonging to 
Stordalen Mire, Sweden from where the species was characterized). 

Description (brief). Phylogenetic analyses of genes/markers indicated 
that this species is different from all other recognized genera in the family 
Acidobacteriaceae. 

Description of ‘Candidatus Changshengia gen. nov. 

“Candidatus Changshengia (Chan.gshen gia. N. L. fem. n. ‘Candidatus 
Changshengia, named in honour of Changsheng Li of The University of New 
Hampshire, a developer of the DeNitrification-DeComposition (DNDC) ecosystem 
model that contributed to our understating of the soil biogeochemical processes 
occurring in a variety of terrestrial ecosystems and climatic conditions). 

Candidatus Changshengia gen. nov. is the second proposed and characterized 
genus in the phylum Dormibacteraeota. The delineation of genus is based on average 
amino acid identity between encoded proteins. 

Description of ‘Candidatus Methanoflorentales’ order nov. 

“Candidatus Methanoflorentales’ (N.L. masc. adj. ‘Candidatus Methanoflorens, 
type genus of the order; suff. -ales, ending to denote an order; N.L. fem. 
pl. n. ‘Candidatus Methanoflorentales’ the order of the genus ‘Candidatus 
Methanoflorens’). 

The description is the same as given for the type genus ‘Candidatus 
Methanoflorens’ and the family ‘Candidatus Methanoflorentaceae’ Mondav et al. 
(2014) with the following modifications. The delineation of the order is determined 
by phylogenetic analyses showing that the Methanocellales would otherwise be 
paraphyletic. The order currently comprises two species “Candidatus M. stordalen- 
mirensis’ and ‘Candidatus M. crilli’. The type genus is “Candidatus Methanoflorens. 
Production and consumption rates of methane. Per-cell methane production and 
consumption rates were taken from studies of isolate cultures (for production’*"”° 
0.19-4.5 fmol CH, cell"! h“! and for consumption’”~” 0.2-40 fmol CH, cell! h-}). 
Rates were taken as the mean of these rates for production and consumption rates, 
respectively. 

Bulk density measurements. Data are from one palsa core sample taken from 
July 2013. In the field, 50 cm? aliquots of fresh peat from each core section were 


removed and frozen. In the laboratory, each 50 cm? section of peat was weighed, 
freeze dried and then reweighed. Bulk densities were determined gravimetrically 
and calculated from the freeze-dried weights of the volumetric sections. Water 
contents were determined by the per cent change in weight of the peat before and 
after freeze-drying. 

Metatranscriptomics. Metatranscriptome sequencing was conducted on select 
samples from 2010, 2011 and 2012, comprising four palsa, eight bog and twelve 
fen samples. ScriptSeq Complete (Bacterial) low-input library preparation kits 
(Epicentre) were used with 240 ng of sample RNA that had been co-extracted along- 
side the DNA from the initial sample material as input as described previously. 
DNAse I (Roche) was used to remove residual DNA from the RNA after extractions. 
Agilent 2100 Bioanalyzer and Agilent 2200 Tapestation (Agilent Technologies) 
were used to check the quality of RNA and libraries during processing, with Qubit 
(ThermoFisher Scientific) used to determine quantity. These samples were run on 
1/8th of a NextSeq (Illumina) lane, with initial shallow runs conducted on 1/11th 
of a HiSeq (Illumina) and MiSeq (Illumina) lanes. Files originating from the same 
metatranscriptome libraries were concatenated before analysis. 

SeqPrep (J. A. St John, https://github.com/jstjohn/SeqPrep) was used to remove 
sequencing adaptors. PhiX contamination was removed by mapping the reads against 
the PhiX genome using BamM, and reads that aligned were removed. SortMeRNA 
v2.1°° was used to remove non-coding RNA sequences (tRNA, tmRNA, 5S, 16S, 18S, 
23S, 28S). To assign expression values to each gene, reads were first mapped in pairs 
to the dereplicated set of MAGs using BamM make, and filtered using BamM filter 
with cutoffs of 95% identity and 75% alignment. The count of reads mapped to each 
gene was calculated using DirSeq (https://github.com/wwood/dirseq, internally using 
BEDTools*!) based on any overlap of forward reads with the open reading frame 
of the gene, tabulating the sense and antisense mappings independently. To avoid 
the potential for DNA contamination of the RNA libraries to provide a misleading 
interpretation of a gene being expressed, the number of reads mapping in the sense 
direction were compared to the number mapping in the antisense direction using a 
one-sided binomial test. Genes with a significantly more reads mapping in the sense 
direction (P< 0.05) were classified as ‘expressed. For each significantly expressed 
gene, the number of antisense reads was subtracted from the number of sense reads to 
correct for metagenome contamination. These normalized expression estimates were 
used to calculate the TPM score*’, using only protein coding genes (CDS regions 
defined in the Prokka annotated GFF files) in each sample. 

Pathway expression was calculated as the average expression of the steps within 

a pathway. Ifa pathway step included an enzyme complex, the average expression 
of each subunit was used as the expression value of that step. If a reaction could 
be catalysed by more than one enzyme, or if multiple copies of an enzyme were 
encoded by a genome, then their summed expression was used as the expression 
value of that step. 
Metaproteomics. Protein extraction, purification, and digestion. Metaproteome 
analysis was conducted on 22 samples from 2012, collected from the same cores 
and depths as material used for metagenomes and metatranscriptomes. Three 
metaproteomes were created by pooling replicate cores (3 x 3 replicates). Sample 
nomenclature denotes year and month, followed by habitat (P= palsa, S=bog, 
E= fen), core number (123 indicates replicate cores 1, 2 and 3 were pooled) 
and depth (surface = S, mid= M, deep = D, extra-deep = X). The 16 resulting 
metaproteomes were as follows: four palsa (20120600_P123M, 20120700_P3M, 
20120700_P3D, 20120800_P2M) six bog (20120600_S123M, 20120600_S123D, 
20120700_S2M, 20120700_S1D, 20120800_S1M, 20120800_S1X), and six fen 
samples (20120700_E3M, 20120700_E3D, 20120700_E2X, 20120800_E2M, 
20120800_E2D, 20120800_E3D) (Supplementary Data 1). Proteins were extracted 
and digested using substantial modifications of methods developed previously 
for our site*“. In brief, samples were thawed and 35g of peat per sample was split 
equally into two 50 ml tubes, and sodium dodecyl sulphate (SDS)-resuspension 
buffer was added to a final volume of 30 ml. SDS-resuspension buffer (pH 8) was 
freshly prepared as: (1) an SDS buffer of 50mM dithiothreitol (DTT) in 10 ml of 4% 
SDS, (2) a separate resuspension buffer of 50 mM trisaminomethane (Tris) Buffer 
(2.21 g Trizma-HCl and 4.36 g Trizma Base (Millipore Sigma)), 150 mM NaCl, 
1mM EDTA, and HPLC-grade water up to 1], (3) the 10 ml SDS buffer (warmed 
at 60°C for 2 min) was mixed with 40 ml of resuspension buffer, and the final pH 
was adjusted to 8. Samples were vortexed for 10 min using a tabletop vortexer with 
adapters for 50 ml conical tubes, and then 10g of 0.1 mM glass beads (Qiagen, 
Hilden, Germany) were added, followed by 30 min of vortexing. Samples were 
centrifuged at 3,000g for 20 min, the supernatant transferred to a new tube and 
centrifuged at 4,800¢ for 20 min. The supernatant was transferred to a new tube, 
to which 100% trichloroacetic acid (TCA) was added to a final concentration of 
30%. Samples were shaken and then stored at 4°C overnight. 

Samples were centrifuged at 4,800g for 1h 30 min at 4°C, and then supernatant 
was decanted and pellets from the same sample were combined. The following steps 
were repeated three times: pellets were washed with 1 ml cold acetone, placed on 
ice for 5 min, vortexed briefly, centrifuged at 24,000g for 25 min at 4°C, and then 
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supernatant was removed. Pellets were dried under N> gas, and then 1-1.5 ml 
of denaturing buffer was added. Denaturing buffer was prepared as follows: 
(1) a digestion buffer was prepared with 4.88 g of Trizma-HCl, 2.30 g of Trizma 
Base, and 1.11 g of CaCl, brought to a 11 volume with HPLC-grade water, 
(2) guanidine-HCl was added to digestion buffer to a final concentration of 6M ina 
50-ml tube. Samples were incubated at 60°C for 1h, vortexing for 5s every 10 min, 
then transferred to a new tube, to which digestion buffer was added to a final 
volume of 15 ml. Proteins were digested by adding 20\1g trypsin (NEB) and 
incubating on a nutating mixer at 37°C overnight. 

A further 10g of trypsin was added to each sample, followed by incubation 

on a nutating mixer at 37°C for 4h. DIT was added (approximately 100 mg), then 
samples were returned to the 37 °C nutating mixer for 45 min and then centrifuged 
at 3,000 r.p.m. for 5 min at 20°C. The supernatant was transferred to a new tube. 
Similar to previously reported peptide purification protocols®, Sep-Pak Plus C18 
cartridges (Waters) were conditioned with 10 ml of acetonitrile + 0.1% formic acid, 
then washed with 10 ml of 0.1% formic acid (in water), then the samples were added 
and the flow-through was refiltered through the cartridges three additional times, 
followed by a wash with 10 ml of 0.1% formic acid (in water). Peptides were eluted 
with 5 ml of acetonitrile + 0.1% formic acid, then added to 0.45 1m Ultrafree-MC 
filter tubes according to the manufacturer’s protocol (Millipore Sigma). 
Peptide fractionation and mass spectroscopy. A bicinchoninic acid assay (Thermo 
Scientific, Rockford, IL) was performed to determine the peptide mass in each 
sample. The samples were then diluted with 10 mM ammonium formate, pH 10 
(‘buffer A’) to a volume of 9301, centrifuged at 10,000 g for 2-5 min to remove any 
precipitates, and transferred to snap-cap ALS vials. The diluted samples (pH 10) 
were resolved on a XBridge C18, 250 x 4.6mm, 5|1M with 4.6 x 20mm guard 
column (Waters). Separations were performed at 0.5 ml/min using an Agilent 1100 
series HPLC system (Agilent Technologies, Santa Clara, CA) with mobile phases 
(A) buffer A and (B) buffer A/acetonitrile (10:90). The gradient was adjusted from 
100% A to 95% A over the first 10 min, 95% A to 65% A over minutes 10 to 70, 
65% A to 30% A over minutes 70 to 85, maintained at 30% A over minutes 85 to 
95, re-equilibrated with 100% A over minutes 95 to 105, and held at 100% A until 
minute 120. Fractions were collected every 1.25 min (96 fractions over the entire 
gradient) and every 12th fraction were pooled for a total of 12 fractions per sample. 
All fractions were dried under vacuum and 20 1 of nanopure water was added to 
each fraction for storage at -20°C until LC-MS/MS analysis. 

Fractions were analysed by reversed-phase LC-MS/MS using a Waters 
nanoAquity UPLC system coupled with a Q-Exactive Plus hybrid quadrupole/ 
Orbitrap mass spectrometer from Thermo Fisher Scientific. The analytical 
column was packed in-house by slurry packing 3-|1m Jupiter C)s stationary phase 
(Phenomenex) into a 70-cm long, 360jum OD x 751m ID fused silica capillary tubing 
(Polymicro Technologies Inc.). Mobile phases consisted of 0.1% formic acid in 
water (MP-A) and 0.1% formic acid in acetonitrile (MP-B). Samples were adjusted 
to a concentration of ~ 0.1 j1g/1l and 5 1 injections were directly loaded onto the 
analytical column at a flow rate of 300 nl/min and 1% MP-B. The full loading, 
gradient elution, and column regeneration profile was as follows (min:%MP-B); 
0:1, 30:1, 32:8, 50:12, 105:30, 110:45, 120:90, 125:90, 130:1, 170:1. Data acquisi- 
tion (100 min) was started at the end of the sample loading period (30 min). The 
analytical column was coupled to the Q-Exactive using a home-built nanospray 
adaptor interface with 2.2 kV applied to achieve electrospray ionization. The MS 
inlet was maintained at a temperature of 300°C. A precursor scan was performed 
from m/z 300 to 1800 at a resolution of 30k and an automatic gain control (AGC) 
of 3 x 10°. Operated in data dependent mode, the top 12 most intense ions from the 
precursor scan were selected for high energy collision dissociation (HCD) MS/MS 
at a resolution of 17.5k, AGC of 1e5, isolation window of 2 m/z, and a max ion time 
of 100 ms. Only ions identified as having a +2 charge or higher were subjected to 
HCD and subsequently excluded from further analysis for 30s thereby allowing 
for deeper coverage. In total, 192 mass spectra files were generated (12 fractions 
for each of the 16 samples). 

Database search and expression analysis. A sensitive and universal database search 
tool, MSGFPlus* v2017.01.13, was used to conduct the metaproteome searches in 
this study. Prior to searching the metaproteomes, the mass spectrometer RAW output 
files were converted to the mzML format using msConvert of ProteoWizard® 
3.0.10200, accepting the default parameters. The mzML files were then searched 
against a targeted protein database containing protein sequences predicted from 
the metagenome-assembled genomes across the permafrost thaw gradient and 
involved in metabolic pathways examined in the manuscript, as well as the entire 
CDS regions of Ca. ‘Acidiflorens, Ca. ‘Methanoflorens’ and AD3 (Supplementary 
Data 14). Proteins in the targeted database were dereplicated at 100% amino acid 
identity using usearch v9.2.64 (-fastx_uniques)* after converting all isoleucine 
residues to leucine (due to identical masses). In order to calculate the false 
discovery rate (FDR), a parallel search of a decoy protein database was conducted 
by using the (-tda 1) parameter of MSGFPlus during the indexing and searching 
steps of the targeted database. After conducting the searches, the FDR was 
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calculated as: FDR(t) =#DecoyPSMs at (t)/#TargetPSMs at (t), where (ft) is the 
highest Q value that gives a FDR of < 1%, and #DecoyPSMs and #TargetPSMs are 
the numbers of Decoy and Target Peptide-Spectrum Matches (PSM), respectively, 
at that Q-value threshold. Only PSMs with Q < (f=0.0145) were considered in 
the results of this study. The maximum precursor mass tolerance allowed during 
the searches was specified by the (-t 20 ppm) parameter for parent mass tolerance 
and the (-ti ‘-1,2”) parameter for isotope error range. Trypsin was specified as 
the digestion method (-e 1) and only full tryptic digestion was allowed (-ntt 2). 
Minimum and maximum peptide lengths to consider were 6 and 50, respectively. 
Minimum and maximum precursor charges to consider were 2 and 5, respectively. 
For each spectral scan, only the PSMs with the highest MSGF score (-n 1) were 
considered for subsequent analyses. 

The detection of a unique peptide in at least one sample was considered 
evidence of expression of a specific enzyme. A specific pathway was designated as 
expressed when at least one enzyme from a pathway was detected in the metap- 
roteome. Further, that enzyme had to be encoded in a genome where >50% of 
genomes in the 97% ANI genome cluster encoded all steps of that pathway. Only 
avery small number of proteins (<1%) were detected where the sequence of that 
protein was 100% identical to a protein from a different ANI genome cluster. 

The limited number of proteins detected for the fen hydrolysers is likely a 

consequence of the diversity and complexity of the fen community, and the variety 
of proteins produced, compared to the palsa and bog. The metaproteomes were 
searched against a targeted protein database containing only those sequences pre- 
dicted from the MAGs. Peptide-spectrum matching requires 100% matches, which 
means that variations in sequence due to population heterogeneity/diversity, which 
is greatest in the fen (Fig. 1; Extended Data Fig. 2), are not captured, reducing 
the number of successful matches. Our approach is conservative following the 
genome-centric focus, and used purely to show that the populations of interest 
are translating proteins in situ. 
Statistical analyses. Analysis of variance (ANOVA) tests, Mann-Whitney U-tests, 
and least squares regressions were calculated using R°”. Box plots were created using 
ggplot2*’, with the boxes representing the Ist to 3rd quartiles and the whiskers 
the highest (upper whiskers) or lowest (lower whiskers) observation within 
1.5x the interquartile range. The centre line represents the mean. Figure 2 sample 
numbers were as follows: palsa surface n = 18, palsa mid n= 18, palsa deep n= 17, 
bog surface n= 19, bog mid n = 20, bog deep n= 20, bog extra-deep n=6, fen 
surface n= 23, fen mid n= 18, fen deep n= 23, fen extra-deep n=6 biologically 
independent samples. A square root scale was used on the y axis. P values for signif- 
icant differences among depths assessed with ANOVA were 4 x 10-7,2x 10->, and 
2 x 1075 for palsa, bog and fen respectively for cellulose degradation, and 4 x 10-2, 
3 x 10-8 and not applicable (n/a, no significant differences were found) for xylan 
degradation, 5 x 10-3, n/a, n/a for xylose degradation (dehydratase), n/a, 2 x 10-2, 
1 x 10-4 for xylose degradation (oxidoreductase), 3 x 10-4, 9 x 10°”, 6 x 10°3 
for xylose degradation (isomerase), 2 x 10~4,5 x 10~°, n/a for lactate fermentation, 
n/a, 3 x 10~?, n/a for ethanol fermentation, 8 x 107°, 5 x 107’, n/a for propionate 
fermentation, n/a, 2 x 1074, 5 x 10~? for hydrogenotrophic methanogenesis, n/a, 
2x 10-3, 4 x 10°? for acetoclastic methanogenesis. Figure 3 sample numbers were 
as follows: 3a, n= 47, 3b, n= 65, 3c, n=70 biologically independent samples. 

To test for associations between microbial populations and geochemical 
variables, the abundance of genus- to family- level lineages was calculated using 
phylogenetic tree insertion of open reading frames derived from individual reads 
into single-copy ribosomal protein marker trees using GraftM** and phylogenetic 
trees derived from both contigs assembled from data presented here and reference 
datasets (data not shown). To avoid statistical complications arising from multiple 
hypothesis testing, only the ten most abundant lineages in the bog and fen were 
tested for significance using least squares regression. Lineages correlating signifi- 
cantly were then linked to MAGs for correlations between individual MAGs and 
geochemical variables as well as between Ca. ‘Acidiflorens’ sub-lineages. 

Figure generation. Manuscript figures were generated using custom R® scripts, 
ggplot2*’, spacemacs (http://spacemacs.org/), Rstudio®, arb®, d3js (https://d3js. 
org/), Inkscape (https://inkscape.org/) and Adobe Illustrator (http://www.adobe. 
com/au/products/illustrator.html). 

Code availability. The above methods indicate the source of the code and 
programs used for analyses within the relevant sections. 

Data availability. Data described in this manuscript have been submitted 
under NCBI BioProject accession number PRJNA386568. MAGs were deposited 
at DDBJ/ENA/GenBank under the accession numbers provided in Supplementary 
Data 3, and the initial versions are described in this paper. The mass spectrometry 
proteomics data have been deposited to the ProteomeXchange Consortium 
(http://www.proteomexchange.org/) via the PRIDE” partner repository with 
the dataset identifier PXD009096 (https://doi.org/10.6019/PXD009096). 
Supplementary Data 1-9 are available with the online version of this manuscript. 
Supplementary Data 10-15 are available on figshare (https://doi.org/10.6084/ 
m9.figshare.6233660). 
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Extended Data Fig. 1 | Phylogenetic distribution of MAGs recovered 
from Stordalen Mire. a, Phylogenetic tree of 647 dereplicated MAGs. 
Numbers in parentheses show total MAGs recovered and phylogenetic 
gain of Stordalen MAGs compared to publicly available genomes for 
each phylum. Red text indicates previously poorly represented phyla. 
b, Acidobacteria subtree showing the Ca. ‘Acidiflorens’ lineages. 

c, Eremiobacteraeota subtree incorporating the CARN1*! MAG. 


ARTICLE 


Chloroflexi (66, 5.9%) 
Actinobacteria (385, 7.5%) 
precenens Cyanobacteria (2, 0.7%) 
Caldiserica (6, 5.7%) 
weeneenen eens TM6 (1, 5%)| 
Spirochaetes (3, 1.6%)| 
---- FCPU426 (6, 70.3%) 
~--Elusimicrobia (5, 8.9%) 
--- Omnitrophica (6, 7.6%)! 
---- Planctomycetes (22, 7.6%) 
Chlamydiae (4, 13.6%)| 
Verrucomicrobia (101, 11.6%) 
WOR-3 (1, 2.2%)| 
Fed Gemmatimonadetes (24, 26.4%)| 
Fibrobacteres (2, 7.3%) 
Ignavibacteriae (50, 21.4%) 
Bacteroidetes (41, 1.5%)| 
Nitrospirae (14, 5.2%)| 


[c] WPS-2 (Eremiobacteraeota) 


UBA cluster 
SRX710592 UBA5184 
CARN1 


scale 0.1-—4 


(Dormibacteraeota) 


Ca. ‘Chengshengia’ 


SRX461673 UBA8260 
SRX461727 UBA12275 


GCA001917775.1 
'CA001918235.1 
‘SRX710580 UBA4736 
Ca. 'M. crillii’ 
Euryarchaeota palsa 44 
Euryarchaeota bog 81 
Ca.'M. stordalenmirensis' 
SRX461727 UBA590 

Euryarchaeota fen 29 


fe Ca. 'M. stordalenmirensis' 


Ca. 'Methanoflorentales' 


Methanocellales 


Methanosarcinales 


Methanomicrobiales 


scale 0.14-———+4 


d, Dormibacteraeota subtree, showing Ca. ‘Changshengia. e, Subtree of Ca. 
‘Methanoflorentales’ MAGs, and closest neighbouring orders. In b-e, pie 
charts show phylogenetic gain, red lines indicate Stordalen MAGs, black 
lines indicate public genomes, blue triangles indicate clustered public 
genomes and red triangles indicate clustered Stordalen MAGs. Black dots 
indicate bootstrap values 70-100%. 
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Extended Data Fig. 2 | Microbial community profile of the thaw 
gradient. a, Relative abundance of each phylum estimated through 

the recovery of 16S rRNA gene reads, averaged within each thaw stage. 
The 15 phyla with the highest relative abundance across all samples 

are shown. b, Number of MAGs recovered from each of the phyla in 

a, showing that broadly, MAGs recovered are from lineages highest 

in abundance. ¢, Principal coordinates analysis of weighted UniFrac 
compositional differences between samples, based on average coverage 

of each recovered genome of reads mapped to the dereplicated genome 
set. Colours indicate thaw stage: brown = palsa (P), green = sphagnum/ 
bog (S), blue = eriophorum/fen (E). Depth: S = surface, M = mid-depth, 

D = deep, X = extra-deep. Goodness of fit was 0.57 for PCoA 1 and 0.65 
for PCoA 2. Sample numbers: n= 53, 65 and 70 biologically independent 
samples for palsa, bog and fen, respectively. d, Quantitative PCR analysis 
of samples taken in 2012. The number of cells per gram of soil is shown for 
three depths at the three thaw stages, after correcting for 16S rRNA gene 
copy number variation (see Methods). Fen samples contained significantly 


more cells per gram of soil than bog and palsa samples (average 2.6, 
P=7 x 10°, n= 103, two-sided Mann-Whitney U-test). Sample numbers: 
n=8, 9, and 8 for biologically independent samples palsa surface, mid and 
deep, respectively, n =9, 8, 9 and 10, 7 and 9 for bog and fen, respectively. 
e, f, Relative abundances of phyla and classes within the Proteobacteria 
across the thaw gradient, respectively. The depth of each sample is 
indicated by the colour of the box (surface: red, mid-depth: green, 

deep: blue, extra-deep: purple). Each data point is the sum of relative 
abundances of all lineages assigned to the phylum in a sample after adding 
a 0.1% pseudocount to all phyla (so the y axis is not dominated by small 
values visually). Box plots are shown plotted on a log-scale y axis, with 
phyla and classes ordered by decreasing average relative abundance across 
all samples. Relative abundance was calculated based on the fraction of 
the community with recovered genomes (see Methods). Sample numbers: 
n=53, 65 and 70 biologically independent samples for palsa, bog and fen, 
respectively. 
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Extended Data Fig. 3 | See next page for caption. 
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Extended Data Fig. 3 | Prevalence of individual MAGs across the thaw 
gradient. a, Number of samples where each Stordalen MAG is present at 
>1% relative abundance among each stage of the thaw gradient. Vertical 
red lines indicate the number of samples sequenced in total from that 
environment. Only one MAG, ‘Deltaproteobacteria_fen_1087;, was 
found in a high abundance across fen sites, detected at >1% relative 
abundance in 96% of fen sites. b, The same information stratified by 
depth of the sample in the soil column. The specific MAGs prevalent 

are detailed in Extended Data Table 1, showing that a small number of 
populations were prevalent at a specific depth of a specific site. c, Stordalen 
genomes that changed significantly in abundance with depth. For each 
site, genomes that show the largest absolute difference in abundance 
between shallow and deep samples are shown. Genomes that are more 
abundant in shallow samples compared to deep are positive, and those 
more abundant in deep samples relative to shallow samples are negative. 
Only those lineages with a mean absolute difference of >1% and that 

are significantly different (P < 0.05, two-sided Mann-Whitney U-test) 
are shown. Sample numbers: n = 53, 65 and 70 biologically independent 
samples for palsa, bog and fen, respectively. Each bar indicates a 97% 
dereplicated MAG that changes in relative abundance between surface 
and deep samples and the colour of each indicates the phylum the genome 
belongs to. The fen is less stratified between the surface and deep, which 
is reflected in the fewer population abundances significantly changing 
in abundance between shallow and deep samples. Recovered congeneric 
genomes that showed significant but inverse differential abundance 
between surface and deep samples are shown in Supplementary Data 7. 
Genomes depicted in c in order are Acidobacteria_palsa_348 = 1, 
Acidobacteria_palsa_246 = 2, Actinobacteria_palsa_463 = 3, 
Actinobacteria_palsa_558 = 4, Acidobacteria_palsa_312=5, 
Alphaproteobacteria_palsa_929 = 6, Actinobacteria_palsa_504=7, 
Acidobacteria_palsa_125 = 8, WPS2_palsa_1515=9, Acidobacteria_ 
palsa_289 = 10, WPS2_palsa_1516 = 11, Acidobacteria_palsa_310 = 12, 
Alphaproteobacteria_palsa_913 = 13, Actinobacteria_palsa_693 = 14, 
Actinobacteria_palsa_465 = 15, Actinobacteria_palsa_691 = 16, 
Alphaproteobacteria_palsa_895 = 17, Actinobacteria_palsa_505 = 18, 
Actinobacteria_bog_593 = 19, Actinobacteria_palsa_462 = 20, 
Acidobacteria_palsa_199 = 21, Acidobacteria_palsa_362 = 22, 
Acidobacteria_palsa_313 = 23, Gammaproteobacteria_palsa_1209 = 24, 
Acidobacteria_palsa_267 = 25, Planctomycetes_palsa_1347 = 26, 


Acidobacteria_palsa_143 = 27, Verrucomicrobia_palsa_1397 = 28, 
Actinobacteria_palsa_641 = 29, Actinobacteria_palsa_733 = 30, 
Acidobacteria_palsa_420 = 31, Actinobacteria_palsa_736 = 32, 
Verrucomicrobia_palsa_1413 = 33, Alphaproteobacteria_palsa_910 = 34, 
Acidobacteria_palsa_286 = 35, Acidobacteria_palsa_122 = 36, 
Acidobacteria_palsa_343 = 37, Deltaproteobacteria_palsa_1114= 38, 
Gemmatimonadetes_palsa_1248 = 39, Acidobacteria_palsa_340 = 40, 
Acidobacteria_palsa_141 = 41, Alphaproteobacteria_palsa_922 = 42, 
WPS2_palsa_1496 = 43, Actinobacteria_bog_635 = 44, Actinobacteria_ 
bog_766 = 45, Actinobacteria_bog_592 = 46, Gammaproteobacteria_ 
bog_1200 = 47, Actinobacteria_bog_ 594 = 48, Acidobacteria_ 

bog_329 = 49, Verrucomicrobia_bog_1475 = 50, Actinobacteria_ 
bog_723 =51, Acidobacteria_bog_233 =52, Verrucomicrobia_ 

bog_1402 = 53, WPS2_bog_1492 = 54, Alphaproteobacteria_ 

bog_899 = 55, WPS2_bog_1527 = 56, Actinobacteria_bog_769=57, 
Acidobacteria_bog_377 = 58, Actinobacteria_bog_637 =59, FCPU426_ 
bog_1183 = 60, Alphaproteobacteria_bog_900 = 61, Acidobacteria_ 
bog_234= 62, WPS2_bog_1502 = 63, Verrucomicrobia_bog_1421=64, 
Gammaproteobacteria_bog_1206 =65, Alphaproteobacteria_ 

bog_908 = 66, Betaproteobacteria_bog_ 994=67, Acidobacteria_ 

fen_416 = 68, Actinobacteria_fen_548 = 69, Acidobacteria_bog_445=70, 
Acidobacteria_bog_96 = 71, Acidobacteria_bog_202 = 72, 
Actinobacteria_fen_455 = 73, AD3_bog_854=74, Acidobacteria_ 
bog_218=75, Actinobacteria_bog_ 806 = 76, Acidobacteria_ 
bog_390=77, Actinobacteria_bog_524=78, Euryarchaeota_bog_81=79, 
Verrucomicrobia_bog_1459 = 80, AD3_bog_876 = 81, Actinobacteria_ 
bog_808 = 82, Acidobacteria_bog_226 = 83, Actinobacteria_ 

bog_576 = 84, Acidobacteria_bog_406 = 85, Acidobacteria_fen_408 = 86, 
Deltaproteobacteria_fen_1088 = 87, Nitrospirae_fen_1304=88, 
Bacteroidetes_fen_982 = 89, Bacteroidetes_fen_956 = 90, Acidobacteria_ 
fen_335 = 91, Euryarchaeota_fen_63 = 92, Gammaproteobacteria_ 
fen_1191 = 93, Deltaproteobacteria_fen_1087 = 94, 
Gammaproteobacteria_fen_1218 = 95, Gammaproteobacteria_ 

fen_1219 = 96, Actinobacteria_fen_730 = 97, Deltaproteobacteria_ 
fen_1138 = 98, Chloroflexi_fen_1050 = 99, Actinobacteria_fen_453 = 100, 
Acidobacteria_fen_408 = 101, Actinobacteria_fen_548 = 102, 
Chloroflexi_fen_1019 = 103, Acidobacteria_fen_414 = 104, 
Actinobacteria_fen_455 = 105. 
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Extended Data Fig. 4 | Cellulase, xylanase and 8-glucosidase gene contribution of each phylum to the total TPM of the enzyme class 
expression across the thaw gradient. a, b, Cellulose; c, d, xylanase; observed in the metatranscriptomes. b, d, f, Total TPM of all expressed 
e, f, B-glucosidase. Samples analysed with metatranscriptomics are genes in the sample. 

described by the date of sampling, core number and depth. a, ¢, e, Relative 
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Extended Data Fig. 6 | Xylose degradation pathways at Stordalen Mire. 
a, Diagram of xylose degradation pathways. b, Venn diagram showing 
how each xylose breakdown pathway is shared among the Stordalen Mire 
MAGs. Percentages represent the proportion compared to all Stordalen 
genomes encoding a xylose degradation pathway. In the metaproteomes, 


genomes Acidobacteria_bog_390, Actinobacteria_fen_455 and 


Actinobacteria_bog_808 expressed a protein specific to oxidoreductase 


pathways and a protein specific to the isomerase pathway. In the 
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Acidobacteria_bog_ 390, Actinobacteria_fen_455, Actinobacteria_ 
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bog_586, Actinobacteria_bog_808 and Planctomycetes_fen_1346 
expressed a protein specific to oxidoreductase pathways and a protein 
specific to the isomerase pathway. c—h, Gene expression of xylose 
degradation pathways. Average expression of genes in the canonical 
bacterial xylose isomerase (c, d), oxidoreductase (e, f) and xylanate 


dehydratase pathways (g, h) are depicted across the thaw gradient. Samples 


analysed with metatranscriptomics are described by the date of sampling, 


core number and depth. c, e, g, Relative contribution of each phylum to the 
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total TPM of the enzyme class observed in the metatranscriptomes. 
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Extended Data Fig. 7 | Gene expression of fermentation pathways. 
Samples analysed with metatranscriptomics are described by the date 


of sampling, core number and depth. a, c, e, g, Total TPM of each 
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fermentation pathway in the metatranscriptomes. b, d, f, h, Relative 
contribution of each phylum to the total TPM of each pathway. 
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Extended Data Fig. 8 | Correlation of microbial and geochemical data. 
a, CO, and CH, concentrations in porewater derived from the bog and 
fen. The blue line shown is a line of best fit, forced through the origin. 
Dots indicate the samples, with colours indicating the sample depth. The 
concentrations are correlated, and the CH, concentrations are much lower 
than the CO; concentrations in both sites. Sample numbers: n = 51 (bog) 
and 61 (fen) biologically independent samples. b, Methanogenesis versus 
methanotrophy rates. Each point represents the average relative abundance 
of methanotrophs and methanogens across all samples in a single core, 
multiplied by the rate of methane generation or consumption inferred 
from previous culture-based measurements (2.345 and 20.1 fmol CH, h7! 
per cell of methanogenesis and methanotrophy, respectively, see Methods). 
The line represents the 1:1 ratio. Inferred fluxes were calculated using 
relative abundance of methanogenic or methanotrophic lineages so rates 
are only intended for comparison between the x and y axes, rather than 

as an absolute measure of CH, flux. Methanotrophy appears to mitigate a 
significant proportion of the CH, generated in the bog sites. c, Correlation 
of the relative abundance of Ca. ‘Methanoflorens stordalenmirensis’ with 
the isotopic fractionation of methane (ac) dissolved in paired porewater 
samples taken from the bog. Previously observed in 2011 using 16S 


rRNA gene amplicon sequencing”, the correlation is confirmed here 
using genome-centric metagenomic techniques on the 2011 samples, 

as well as in a new year of sampling in 2012. Sample numbers: n = 23 
(2011) and 24 (2012) biologically independent samples. d, e, Expression 
of methanogenesis marker gene mcrA across the thaw gradient. Samples 
analysed with metatranscriptomics are described by the date of sampling, 
core number and depth. d, Relative contribution of each methanogenic 
order to the total TPM. e, Relative contribution of all mcrA genes in 

the metatranscriptome. Metaproteomes revealed the expression of 

289 hydrogenotrophic McrA proteins across 13 samples, as well as 78 
acetoclastic McrA proteins across eight samples (Supplementary Data 2). 
f, Linear regression analysis for predicting effective fractionation 

(a) of CH, from environmental variables and Ca. ‘Methanoflorens 
stordalenmirensis’ abundances in the bog. Ca. ‘Methanoflorens 
stordalenmirensis’ abundance exceeds bulk geochemical parameters in 
predicting the effective fractionation of CH. Each line is the result of 

a linear regression of the specified measurement against the a, of CH4 

in bog porewater samples taken in 2011 and 2012 (n = 47 biologically 
independent samples). 
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Extended Data Fig. 9 | Candidate phylum Dormibacteraeota (AD3) 
genus Ca. ‘Changshengia at Stordalen Mire. a, Total relative abundance 
of the genus Ca. ‘Changshengia’ correlated with the fraction of the 
concentration of C mineralized to CO? versus CH, in the bog porewater 
samples (R?=0.19, P=0.001, n=51 biologically independent samples). 
Each point represents an individual sample from 2012, with its colour 
representing the depth in the core from which the sample was taken. 

b, Metabolic reconstruction of genomes belonging to the candidate 
phylum AD3 genus Ca. ‘Changshengia’ correlating with the CH4:CO, 
concentration ratio in porewater from 2012 bog samples. Genomes from 
four clades within the AD3 were assembled from across Stordalen Mire. 
Enzyme colour indicates the families that share that metabolic potential, 
as outlined in the legend on the left. Arrow colouring indicates whether 
expression was detected (red arrows) or not detected (black arrows) 

for genes encoding the enzyme in any of the 24 metatranscriptomes. 
Orange stars indicate detection of protein expression in any of the 22 
metaproteomes from the Ca. ‘Changshengia and related genomes. All four 
lineages encode the potential to oxidize glycerol anaerobically through 
glycerol transporter (glpF), glycerol kinase (glpK) and a membrane-bound 
glycerol-3-phosphate dehydrogenase (g/pABC), entering glycolysis via 
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dihydroxyacetone phosphate processed to glyceraldehyde-3-phosphate by 
the triosephosphate isomerase (tpiA). Other glycerol derivatives such as 
glycerol-3-phosphate could be imported (glpT) by this and other family 
members, and dihydroxyacetone phosphate can also be processed using 
the PTS-dependent dihydroxyacetone kinase (dhaLMK) complex. Sinks 
for the electrons generated from the oxidation of glycerol also varied 
between the different lineages, with Ca. ‘Changshengia and clade 1 having 
a Ht-translocating complex I NADH:oxidoreductase, while clade 1 also 
has a high affinity cytochrome oxidase complex IV, clade 2 genomes 
encode only a nitrate reductase (narGHI) and clade 4 genomes only a 
fumarate reductase (sdhABCD). These differences are likely to lead to the 
differentiation of the niches that each lineage occupies across different 
sites and depths of the mire. Lineages were considered positive for genes 
or complexes based on the presence of sequences with 80% homology in 
50% of the genomes. c, Phylogenetic subtree showing the family groupings 
of AD3 for the metabolic analysis. Representative genomes from the 97% 
average nucleotide identity (ANI) dereplication are indicated in red. 
Bootstrap support is indicated at the nodes for values over 70% or 90% in 
grey and black, respectively. Blue clade indicates cluster of seven UBA and 
RefSeq genomes. 
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Extended Data Table 1 | Genomes with high prevalence in specific sites and depths 


Genome 
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Genomes shown are representative genomes from the 97% dereplicated set that are present at >1% relative abundance in >90% of samples from the site and depth shown in the ‘Environment’ 
column. The ‘Num present’ column indicates the number of samples in which it is found and ‘Total samples’ indicates the total number of samples available for that environment. 
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Extended Data Table 2 | Overview of proteins detected using metaproteomics 


# of 

Sample spectra Top KOs Top phyla 

K02650: pilA; type IV pilus assembly protein PilA, K04077: groEL, HSPD1; chaperonin GroEL, Acidobacteria (71%), AD3 (19.4%), 
201206_P123M 31 K14028: mdh1, mxaF; methanol dehydrogenase (cytochrome c) subunit 1 Proteobacteria (6.5%), Euryarchaeota 

(3.2%) 

K00024: mdh; malate dehydrogenase, K00320: mer; 5,10-methylenetetrahydromethanopterin Acidobacteria (56.4%), AD3 (21.6%), 

reductase, K00399: mcrA; methyl-coenzyme M reductase alpha subunit, K00401: mcrB; methyl- Euryarchaeota (13.9%), Proteobacteria 
201206 $123D 1687 coenzyme M reductase beta subunit, KO0402: mcrG; methyl-coenzyme M reductase gamma subunit, (3.4%), Actinobacteria (3.2%), 


K02358: tuf, TUFM; elongation factor Tu, K03737: por, nifJ; pyruvate-ferredoxin/flavodoxin 
oxidoreductase, K04043: dnaK, HSPAQ; molecular chaperone Dnak, K04077: groEL, HSPD1; 
chaperonin GroEL, K05349: bgIX; beta-glucosidase 


Verrucomicrobia (0.5%), Ignavibacteriae 
(0.3%), Thaumarchaeota (0.2%), WPS2 
(0.2%), Bacteroidetes (0.1%) 


201206_S123M 


K00024: mdh; malate dehydrogenase, K00138: aldB; aldehyde dehydrogenase, K02358: tuf, TUFM; 
elongation factor Tu, KO3286: TC.OOP; OmpA-OmpF porin, OOP family, KO3737: por, nifJ; pyruvate- 

700 ferredoxin/flavodoxin oxidoreductase, K04043: dnaK, HSPAQ; molecular chaperone DnaKk, K04077: 
groEL, HSPD1; chaperonin GroEL, K05349: bgIX; beta-glucosidase, K14028: mdh1, mxaF; methanol 
dehydrogenase (cytochrome c) subunit 1, K16087: TC.FEV.OM3, tbpA, hemR, IbpA, hpuB, bhuR, 
hugA, hmbR; hemoglobin/transferrin/lactoferrin receptor protein 


Acidobacteria (76.1%), AD3 (14.9%), 
Proteobacteria (3.9%), Euryarchaeota 
(3.4%), Actinobacteria (1.4%), 
Ignavibacteriae (0.1%), Verrucomicrobia 
(0.1%) 


K00399: mcrA; methyl-coenzyme M reductase alpha subunit, KO0401: mcrB; methyl-coenzyme M 
reductase beta subunit, KO0402: mcrG; methyl-coenzyme M reductase gamma subunit, K00428: 


Acidobacteria (45.5%), Euryarchaeota 
(32.7%), AD3 (17.8%), Proteobacteria (2%), 


201207_E2X 101 E1.11.1.5; cytochrome c peroxidase, K00864: glpK, GK; glycerol kinase, KO1114: plc; phospholipase Gemmatimonadetes (1%), Ignavibacteriae 
C, K03286: TC.OOP; OmpA-OmpF porin, OOP family, K11254: H4; histone H4, K12340: tolC; outer (1%) 
membrane protein 
K00193: cdhC; acetyl-CoA decarbonylase/synthase complex subunit beta, KO0320: mer; 5,10- Euryarchaeota (73.9%), Acidobacteria 
methylenetetrahydromethanopterin reductase, KO0399: mcrA; methyl-coenzyme M reductase alpha (17%), AD3 (7.8%), Actinobacteria (0.7%), 
201207 E3D 153 subunit, K00401: merB; methyl-coenzyme M reductase beta subunit, K00402: merG; methyl- Proteobacteria (0.7%) 
= coenzyme M reductase gamma subunit, K00864: glpK, GK; glycerol kinase, KO1895: ACSS, acs; 
acetyl-CoA synthetase, K02040: pstS; phosphate transport system substrate-binding protein, KO2731: 
PSMA7; 20S proteasome subunit alpha 4, K03622: ssh10b; archaea-specific DNA-binding protein 
K00024: mdh; malate dehydrogenase, K00193: cdhC; acetyl-CoA decarbonylase/synthase complex Euryarchaeota (45.3%), Acidobacteria 
subunit beta, KO0320: mer; 5,10-methylenetetrahydromethanopterin reductase, KO0399: mcrA; (26.6%), AD3 (17.3%), Proteobacteria 
201207 E3M 537 methyl-coenzyme M reductase alpha subunit, K00401: mcrB; methyl-coenzyme M reductase beta (5.4%), Chloroflexi (1.3%), Ignavibacteriae 
= subunit, K00402: mcerG; methyl-coenzyme M reductase gamma subunit, K00864: glpK, GK; glycerol (1.1%), Verrucomicrobia (1.1%), Nitrospirae 
kinase, K01895: ACSS, acs; acetyl-CoA synthetase, K03737: por, nifJ; pyruvate-ferredoxin/flavodoxin (0.7%), Actinobacteria (0.6%), Bacteroidetes 
oxidoreductase, K04077: groEL, HSPD1; chaperonin GroEL (0.4%) 
K00402: mcrG; methyl-coenzyme M reductase gamma subunit, KO0406: ccoP; cytochrome c oxidase Acidobacteria (80.7%), AD3 (14%), 
cbb3-type subunit Ill, K00428: E1.11.1.5; cytochrome c peroxidase, K02650: pilA; type IV pilus Euryarchaeota (4.7%), Proteobacteria 
assembly protein PilA, KO3286: TC.OOP; OmpA-OmpF porin, OOP family, K04077: groEL, HSPD1; (0.6%) 
201207_P3D 321 chaperonin GroEL, K10696: BRE1; E3 ubiquitin-protein ligase BRE1, K16087: TC.FEV.OM3, tbpA, 
hemR, IbpA, hpuB, bhuR, hugA, hmbR; hemoglobin/transferrin/lactoferrin receptor protein, K16089: 
TC.FEV.OM2, cirA, cfrA, hmuR; outer membrane receptor for ferrienterochelin and colicins, K16090: 
fiu; catecholate siderophore receptor 
KO00600: glyA, SHMT; glycine hydroxymethyltransferase, KO2358: tuf, TUFM; elongation factor Tu, Acidobacteria (64.1%), AD3 (29.5%), 
K02601: nusG; transcriptional antiterminator NusG, K03046: rpoC; DNA-directed RNA polymerase Euryarchaeota (3.8%), Proteobacteria 
201207 P3M 78 subunit beta’, KO3704: cspA; cold shock protein (beta-ribbon, CspA family), K04043: dnaK, HSPAQ; (2.6%) 
a molecular chaperone DnaK, K04077: groEL, HSPD1; chaperonin GroEL, KO6006: cpxP; periplasmic 
protein CpxP, K14028: mdh1, mxaF; methanol dehydrogenase (cytochrome c) subunit 1, K17734: 
aprX; serine protease Aprx 
K00024: mdh; malate dehydrogenase, K00399: mcrA; methyl-coenzyme M reductase alpha subunit, Acidobacteria (58.5%), Euryarchaeota 
K00401: mcrB; methyl-coenzyme M reductase beta subunit, K00402: mcrG; methyl-coenzyme M (17.1%), AD3 (16.9%), Proteobacteria 
201207 S1D 1563 reductase gamma subunit, KO2358: tuf, TUFM; elongation factor Tu, KO3286: TC.OOP; OmpA-OmpF (3.2%), Actinobacteria (3%), 
= porin, OOP family, K03737: por, nifJ; pyruvate-ferredoxin/flavodoxin oxidoreductase, K04043: dnaK, Verrucomicrobia (0.6%), Ignavibacteriae 
HSPAQ; molecular chaperone DnaK, K04077: groEL, HSPD1; chaperonin GroEL, K05349: bgIX; (0.3%), Bacteroidetes (0.2%), FCPU426 
beta-glucosidase (0.1%), Chloroflexi (0.1%) 
K00024: mdh; malate dehydrogenase, K01114: plc; phospholipase C, K02358: tuf, TUFM; elongation Acidobacteria (78.3%), AD3 (13%), 
factor Tu, KO3286: TC.OOP; OmpA-OmpF porin, OOP family, KO3695: clpB; ATP-dependent Clp Proteobacteria (5%), Euryarchaeota (2%), 
201207 S2M 797 protease ATP-binding subunit ClpB, K03737: por, nifJ; pyruvate-ferredoxin/flavodoxin oxidoreductase, Actinobacteria (1.1%), Ignavibacteriae 
= K04043: dnaK, HSPAQ; molecular chaperone DnaK, K04077: groEL, HSPD1; chaperonin GroEL, (0.3%), Chloroflexi (0.1%), Verrucomicrobia 
K04749: rsbV; anti-sigma B factor antagonist, K14028: mdh1, mxaF; methanol dehydrogenase (0.1%) 
(cytochrome c) subunit 4 
K00319: mtd; methylenetetrahydromethanopterin dehydrogenase, K00320: mer; 5,10- Euryarchaeota (50.5%), Acidobacteria 
methylenetetrahydromethanopterin reductase, KO0399: mcrA; methyl-coenzyme M reductase alpha (27.4%), AD3 (17.2%), Proteobacteria 
subunit, K00401: mcrB; methyl-coenzyme M reductase beta subunit, K00402: mcrG; methyl- (1.5%), Chloroflexi (1.2%), Actinobacteria 
201208_E2D 725 coenzyme M reductase gamma subunit, KO0864: glpK, GK; glycerol kinase, K03737: por, nifJ; (0.6%), Ignavibacteriae (0.4%), 
pyruvate-ferredoxin/flavodoxin oxidoreductase, K04077: groEL, HSPD1; chaperonin GroEL, K09495: Verrucomicrobia (0.4%), Fibrobacteres 
CCT3, TRICS; T-complex protein 1 subunit gamma, K16087: TC.FEV.OM3, tbpA, hemR, IbpA, hpuB, (0.4%), Planctomycetes (0.1%) 
bhuR, hugA, hmbR; hemoglobin/transferrin/lactoferrin receptor protein 
K00193: cdhC; acetyl-CoA decarbonylase/synthase complex subunit beta, KO0320: mer; 5,10- Euryarchaeota (57.5%), Acidobacteria 
methylenetetrahydromethanopterin reductase, KO0399: mcrA; methyl-coenzyme M reductase alpha (22.2%), AD3 (13.1%), Proteobacteria 
subunit, KO0401: mcrB; methyl-coenzyme M reductase beta subunit, KO0402: mcrG; methyl- (4.1%), Actinobacteria (1%), 
201208_E2M 789 coenzyme M reductase gamma subunit, K00864: glpK, GK; glycerol kinase, K01895: ACSS, acs; Verrucomicrobia (0.5%), Ignavibacteriae 
acetyl-CoA synthetase, K03737: por, nifJ; pyruvate-ferredoxin/flavodoxin oxidoreductase, K04077: (0.4%), Chloroflexi (0.4%), Nitrospirae 
groEL, HSPD1; chaperonin GroEL, K14028: mdh1, mxaF; methanol dehydrogenase (cytochrome c) (0.3%), Bacteroidetes (0.3%) 
subunit 1 
K00399: mcrA; methyl-coenzyme M reductase alpha subunit, KO0401: mcrB; methyl-coenzyme M Acidobacteria (50%), Euryarchaeota 
reductase beta subunit, K00402: mcrG; methyl-coenzyme M reductase gamma subunit, KO3046: (25.9%), AD3 (18.5%), Proteobacteria 
rpoC; DNA-directed RNA polymerase subunit beta’, KO3286: TC.OOP; OmpA-OmpF porin, OOP (3.7%), Actinobacteria (0.9%), Fibrobacteres 
201208_E3D 108 family, KO4077: groEL, HSPD1; chaperonin GroEL, K06882: uncharacterized protein, KO9495: CCT3, (0.9%) 
TRIC5; T-complex protein 1 subunit gamma, K14028: mdh1, mxaF; methanol dehydrogenase 
(cytochrome c) subunit 1, K16087: TC.FEV.OM3, tbpA, hemR, IbpA, hpuB, bhuR, hugA, hmbR; 
hemoglobin/transferrin/lactoferrin receptor protein 
K00024: mdh; malate dehydrogenase, K01114: plc; phospholipase C, K02338: dnaN; DNA Acidobacteria (80%), AD3 (15.6%), 
polymerase III subunit beta, KO3046: rpoC; DNA-directed RNA polymerase subunit beta’, KO3286: Proteobacteria (1.9%), Euryarchaeota 
201208_P2M 160 TC.OOP; OmpA-OmpF porin, OOP family, K04043: dnaK, HSPAQ; molecular chaperone Dnak, (1.2%), Chloroflexi (0.6%), Actinobacteria 
K04077: groEL, HSPD1; chaperonin GroEL, KO6006: cpxP; periplasmic protein CpxP, K14028: mdh1, (0.6%) 
mxaF; methanol dehydrogenase (cytochrome c) subunit 1, K17734: aprX; serine protease AprX 
K00024: mdh; malate dehydrogenase, K00320: mer; 5,10-methylenetetrahydromethanopterin Acidobacteria (62.8%), AD3 (16.7%), 
reductase, KO0399: mcrA; methyl-coenzyme M reductase alpha subunit, KO0401: mcrB; methyl- Euryarchaeota (14.2%), Proteobacteria 
201208 S1M 2034 coenzyme M reductase beta subunit, K00402: mcrG; methyl-coenzyme M reductase gamma subunit, (3.2%), Actinobacteria (2%), Ignavibacteriae 
— K01114: plc; phospholipase C, KO2358: tuf, TUFM; elongation factor Tu, K03737: por, nifJ; pyruvate- (0.2%), Verrucomicrobia (0.2%), 
ferredoxin/flavodoxin oxidoreductase, K04043: dnaK, HSPAQ; molecular chaperone DnaK, K04077: Thaumarchaeota (0.2%), Bathyarchaeota 
groEL, HSPD1; chaperonin GroEL (0.1%), Bacteroidetes (0.1%) 
K00024: mdh; malate dehydrogenase, K00320: mer; 5,10-methylenetetrahydromethanopterin Acidobacteria (44.3%), Euryarchaeota 
reductase, K00399: mcrA; methyl-coenzyme M reductase alpha subunit, K00401: mcrB; methyl- (33.7%), AD3 (16.4%), Actinobacteria 
201208 S1X 3951 coenzyme M reductase beta subunit, KO0402: mcrG; methyl-coenzyme M reductase gamma subunit, (2.5%), Proteobacteria (1.7%), 


The third column shows the ten most abundant KEGG Orthology (KO) groups detected, where the total spectral count for that group was two or more. The fourth column shows the relative abundance 


K02358: tuf, TUFM; elongation factor Tu, KO3286: TC.OOP; OmpA-OmpF porin, OOP family, 
K03737: por, nifJ; pyruvate-ferredoxin/flavodoxin oxidoreductase, K04043: dnaK, HSPAQ; molecular 
chaperone Dnak, K04077: groEL, HSPD1; chaperonin GroEL 


of spectral counts from each phylum. 
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Verrucomicrobia (0.4%), Ignavibacteriae 
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Leukaemia hijacks a neural mechanism 
to invade the central nervous system 


Hisayuki Yao®?, Trevor T. Price’, Gaia Cantelli!, Brandon Ngo!, Matthew J. Warner!’, Lindsey Olivere!, Sarah M. Ridgel, 
Elizabeth M. Jablonski*®, Joseph Therrien’, Stacey Tannheimer?, Chad M. McCall*, Anjen Chenn® & Dorothy A. Sipkins!* 


Acute lymphoblastic leukaemia (ALL) has a marked propensity to metastasize to the central nervous system (CNS). In 
contrast to brain metastases from solid tumours, metastases of ALL seldom involve the parenchyma but are isolated to 
the leptomeninges, which is an infrequent site for carcinomatous invasion. Although metastasis to the CNS occurs across 
all subtypes of ALL, a unifying mechanism for invasion has not yet been determined. Here we show that ALL cells in the 
circulation are unable to breach the blood-brain barrier in mice; instead, they migrate into the CNS along vessels that 
pass directly between vertebral or calvarial bone marrow and the subarachnoid space. The basement membrane of these 
bridging vessels is enriched in laminin, which is known to coordinate pathfinding of neuronal progenitor cells in the 
CNS. The laminin receptor «6 integrin is expressed in most cases of ALL. We found that «6 integrin-laminin interactions 
mediated the migration of ALL cells towards the cerebrospinal fluid in vitro. Mice with ALL xenografts were treated 
with either a PI3K8 inhibitor, which decreased «6 integrin expression on ALL cells, or specific «6 integrin-neutralizing 
antibodies and showed significant reductions in ALL transit along bridging vessels, blast counts in the cerebrospinal fluid 
and CNS disease symptoms despite minimally decreased bone marrow disease burden. Our data suggest that a6 integrin 


expression, which is common in ALL, allows cells to use neural migratory pathways to invade the CNS. 


In the absence of CNS-directed prophylactic treatment, involvement 
of the CNS during disease progression occurs in 30-70% of patients 
with ALL!. Metastasis to the CNS is observed in all subtypes of ALL, 
suggesting that lymphoblasts utilize a conserved molecular mechanism 
to invade the CNS. ALL relapse in the CNS predicts poor outcomes, 
but treatment options remain limited’. 

The enzyme PI3K has an important role transducing extracellular 
signals that regulate cell growth and survival*. The delta isoform of 
PI3K (PI3K6) is uniquely expressed in immune cells and neurons, and 
the PI3K6 inhibitor idelalisib is approved to treat indolent B cell lym- 
phomas*®, The efficacy of idelalisib in acute B cell malignancies has not 
yet been established. Therefore, we examined the role of PI3K6 inhibi- 
tion in the Nalm-6 model of ALL. The Nalm-6 pre-B ALL cell line was 
derived from a patient who suffered a CNS disease relapse. It generates 
a reproducible pattern of disease in severe combined immunodeficiency 
(SCID) mice that mimics ALL development in patients, including CNS 
invasion. At approximately 40 days after intravenous engraftment of 
Nalm-6 cells, all mice show symptoms of CNS involvement, which 
occur before death from progressive bone marrow disease, and lower 
extremity paresis is the usual clinical end point for euthanasia. 


PI3K6 blockade reduces CNS disease burden 

Using a PI3K6 inhibitor tool compound, GS-649443, that has a suit- 
able pharmacokinetic profile in mice, we treated Nalm-6-engrafted 
mice daily with vehicle or inhibitor beginning on day 1 or day 20 after 
engraftment (Fig. la) and continuing until the clinical end point for 
euthanasia was reached. We observed an almost 50% prolongation 
of survival in early and late GS-649443 treatment arms, with most 
GS-649443-treated mice euthanized secondary to bone marrow fail- 
ure (Fig. 1b). Notably, the incidence of CNS disease symptoms at time 
of euthanasia in treated mice was decreased by approximately six- and 


threefold when comparing control and drug treatment in the early and 
late treatment-initiation groups, respectively (Fig. 1c). 

To determine whether the low incidence of CNS involvement in 
these mice reflected delayed disease progression in the periphery, we 
compared the tumour burden in bone marrow, spleen and CNS of 
GS-649443- or vehicle-treated mice at matched time points. Paired 
PI3K¢6 inhibitor- and vehicle-treated mice were euthanized simultane- 
ously when either reached a clinical end point. There was no significant 
difference between bone marrow or splenic Nalm-6 disease burden or 
peripheral blood cell counts in inhibitor- compared to vehicle-treated 
mice (Fig. 1d, e and Extended Data Fig. 1a, b). By contrast, there was 
an approximately 50% decrease in CNS disease burden among paired 
mice (Fig. le and Extended Data Fig. Ic). 

To confirm that our results were not isolated to the Nalm-6 cell line, 
we next tested the efficacy of PI3K6 inhibition in the RCH-ACV ALL 
model, which exhibits symptomatic CNS involvement with similar fre- 
quency, and in non-obese diabetic/SCID/IL-2R mice engrafted with 
primary human ALL cells. Similar to our results in Nalm-6-engrafted 
mice, single-agent PI3K6 inhibition markedly decreased CNS disease in 
these models, but only minimally diminished peripheral disease burden 
(Fig. 1f, g and Extended Data Fig. 1d-i). These findings suggested that 
GS-649443 inhibited progression of ALL in the CNS independently of 
its effects on peripheral disease. Initially, we hypothesized that PI3K6 
signalling was necessary for ALL proliferation or survival in the CNS 
microenvironment. We therefore first examined whether the PI3K& 
inhibitor GS-649443 could penetrate the blood-brain barrier in healthy 
or leukaemic mice. Whereas mean serum concentrations of GS-649443 
were approximately 200-350 nM (Fig. 2a and Extended Data Fig. 2a), 
compound concentrations in the brain tissue of healthy or end-stage 
leukaemic mice were less than 5 nM, well below inhibitory concentra- 
tions (Extended Data Fig. 2c, d). Analysis of blasts in the cerebrospinal 
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Fig. 1 | PI3K8 inhibition blocks progression of ALL in the CNS in vivo. 
a, Schematic of treatment of mice with GS-649443. b, Kaplan-Meier 
survival curves. Two-sided log rank Mantel-Cox test, n = 6 mice per 
treatment group, late treatment: P= 0.0039, early treatment: P=0.0146. 

c, Incidence of hindlimb paralysis at time of euthanasia. n = 6 mice per 
treatment group. d, Nalm-6 ALL cells in subarachnoid space (dashed line) 
of the spinal cord (SC). Representative sections of the vertebrae are shown. 
H&E, haematoxylin and eosin. n = 14 GS-649443-treated mice, 7 vehicle- 
treated mice. e, Disease burden (number of CD10* ALL cells) in the 

CNS, bone marrow and spleen of vehicle- and GS-649443-treated Nalm- 
6-engrafted mice euthanized at matched time points. Paired two-sided 


fluid (CSF) isolated from GS-649443- and vehicle-treated mice revealed 
no differences in apoptotic or mitotic indices, further indicating that the 
pronounced effect of PI3K6 inhibition on CNS disease progression did 
not occur through direct effects on ALL growth or viability (Extended 
Data Fig. 2e-g). We therefore turned our attention to the potential func- 
tion of PI3K6 in ALL metastasis from the periphery to the CNS. 


PI3K modulates the motility of ALL cells 

In addition to its roles in cell growth, PI3K is a key mediator of migra- 
tion, controlling actomyosin contractility through the downstream 
Rho and FAK pathways®®. We therefore evaluated the role of PI3K6 
in ALL cell motility in vitro. PI3K6 inhibition with either GS-649443 
or idelalisib decreased chemotaxis of ALL cell lines and primary ALL 
cells in transwell migration assays (Fig. 2b and Extended Data Fig. 2b). 
ALL migration was independent of both AKT and ROCK, yet highly 
dependent on MLCK activity, suggesting that the FAK signalling path- 
way downstream of PI3K mediated the inhibition of ALL migration? 
(Fig. 2b and Extended Data Fig. 2b, h, i). Western blot analysis indicated 
that myosin light chain (MLC) activity in GS-649443-treated ALL cells 
was reduced (Extended Data Fig. 3a), leading us to hypothesize that 
PI3K6 inhibition in vivo blocked the development of CNS disease by 
broadly paralysing the motility of ALL cells. In previously published 
work, we have shown that ALL and other tumour cells metastasize to 
the bone marrow through sinusoidal vasculature located in specific 
anatomic regions’. Proliferating cells migrate away from these sinusoidal 
vessels over time'!!?. We hypothesized that if ALL cell motility was 
broadly inhibited by PI3K6 blockade, then proliferating Nalm-6 cells in 
the bone marrow of treated mice would fail to migrate from the sites of 
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Student’s t-test, n =5 mice per treatment group, P=0.0473. 

f, Primary (1°) human and RCH-ACV ALL in subarachnoid space of 
spinal cord. Representative sections of the vertebrae are shown. Primary 
ALL: n= 6 GS-649443-treated mice, 7 vehicle-treated mice; RCH-ACV: 
n=4 mice per treatment group. g, Disease burden (indicated by CD10* or 
CD19* ALL cells) in the CNS, bone marrow and spleen of vehicle- or GS- 
649443-treated mice engrafted with primary human ALL and RCH-ACV 
ALL cells. Paired two-sided Student's t-test, primary human cells: n=5 
mice per treatment group; RCH-ACV: n=4 mice per treatment group. 
*P= <0.05, **P< 0.01. Scale bars, 100 pm. 


initial engraftment. We used intravital microscopy to track the location 
of ALL cells in the bone marrow at serial time points after engraftment 
in mice, but found no significant differences in intra-bone marrow 
migration in treated mice compared to mice treated with vehicle con- 
trol (Fig. 2c and Extended Data Fig. 3b). These data suggested that ALL 
cells had not lost all migratory potential in vivo as a consequence of 
PI3K inhibition, but that ALL migration into the CNS was regulated by 
a specific mechanism(s) upstream of MLCK. To identify candidate mol- 
ecules, we performed a microarray analysis of blasts isolated from the 
bone marrow and CSF of Nalm-6-engrafted mice. Genes in both focal 
adhesion and contractility signalling pathways were strongly down- 
regulated in the bone marrow and CSF blasts of GS-649443-treated 
mice (Extended Data Fig. 4a-c). Among multiple genes within the focal 
adhesion pathway was ITGA6, which is intriguing because of its dual 
relationship during cell migration and CNS development!*’. 


PI3K regulates «6 integrin in ALL cells 

ITGA6 encodes the a6 integrin, which dimerizes with 61 or 64 integrin 
to form receptors that specifically bind to the extracellular matrix (ECM) 
molecule laminin'®. Notably, Itga6 knockout mice display developmental 
defects in cerebral cortical organization, with abnormal neurite 
outgrowths studding the surface of the brain". In vitro, neural stem/ 
precursor cells (NSPCs) have been shown to migrate selectively along 
laminin matrices in an a6 integrin-dependent manner’. 06 integrin 
has not been extensively studied in leukaemia, but analysis of samples 
obtained from patients with primary ALL have demonstrated that it is 
expressed by the majority of B and T ALL and that its expression is per- 
sistent or intensified in residual disease following chemotherapy!’~”. 
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Fig. 2 | PI3K6 regulates aspects of ALL cell motility and expression 

of the laminin receptor «6 integrin. a, Serum and brain tissue 
concentrations of GS-649443 in leukaemic mice. Data are mean +s.e.m.; 
n=5 mice per group. b, Effect of PI3K6 and Akt inhibition on transwell 
migration of ALL. Data are mean + s.e.m.; ANOVA with Tukey test, n =3 
biologically independent experiments; *P < 0.05, **P < 0.01. c, Nalm-6 
migration in calvarial bone marrow over 10 days in vehicle- versus 
GS-649443-treated mice. Data are mean of the sample means +s.e.m.; n=3 
mice per treatment group, data points from individual mice distinguished 
by colour, paired two-sided Student's t-test. d, «6 integrin expression 

of ALL cells in brain and bone marrow (BM) in vivo. Representative 


To examine whether «6 integrin is a viable candidate regulator of 
invasion of the CNS by ALL cells, we first assayed its expression in our 
xenograft models. Nalm-6, RCH-ACV and primary human ALL cells 
are strongly positive for membrane expression of «6 integrin in vitro 
and in vivo (Fig. 2d and Extended Data Figs. 4d and 5a, b). We next 
confirmed that PI3K6 inhibition with GS-649443 decreases the cell 
surface expression of «6 integrin by ALL cells (Fig. 2e). Moreover, treat- 
ment of Nalm-6 with a6 integrin-neutralizing antibodies decreased 
phosphorylated MLC2 levels, suggesting that «6 integrin signalling 
could directly modulate cytoskeletal responses in ALL cells (Fig. 2f). 
Taken together, these data suggest a feedforward pathway by which 
PI3K6 and a6 integrin control migration of ALL cells along laminin 
through regulation of actomyosin contractility. 


ALL cells fail to breach blood-brain barrier 

In the adult nervous system, laminin is localized in the ECM of paren- 
chymal microvessels, meninges, the choroid plexus and peripheral 
nerve sheaths”’. Because ALL cells can circulate in the vasculature in 
high numbers, we hypothesized that their metastasis to the CNS was 
haematogenous, mediated through the interaction of a6 integrint ALL 
cells with laminin* microvessels in the brain. To test this, we engrafted 
fluorescently labelled ALL cells intravenously in mice and used con- 
focal microscopy to image whole mounts of the brain tissue at serial 
time points after engraftment. The location of ALL cells with respect 
to the vasculature was determined by injecting a fluorescent vascular 
blood-pool agent immediately before imaging. We found that ALL cells 
quickly arrest at branch points in small microvessels, similar to the CNS 
metastatic process described in melanoma and lung cancer models”! 
that show brain metastasis (Fig. 3a—c and Extended Data Fig. 6a—c). In 
contrast to solid tumour cells, however, ALL cells fail to enter the brain 
parenchyma and to form proliferative lesions”. 

To determine whether ALL cells cross the blood-brain barrier through 
leptomeningeal rather than cerebral parenchymal vasculature, we 
engrafted mice with fluorescent Nalm-6 cells and performed real-time in 
vivo microscopy of the leptomeningeal vasculature through thinned skull 
windows (Extended Data Fig. 7a—d, f and Supplementary Videos 1-5). 


immunohistochemistry sections are shown. Arrowheads indicate ALL 
cells in meninges. Nalm-6 brain: n =7 mice, Nalm-6 bone marrow: n = 12 
mice, primary ALL bone marrow: n= 4 mice, RCH-ACV bone marrow: 
n=11 mice. e, Flow cytometry of ALL a6 integrin expression following 

in vitro treatment with PI3K6 inhibitors. Paired two-sided Student's t-test, 
Nalm-6: n= 3 biological replicates, P= 0.0088; RCH-ACV: n=6 biological 
replicates, P= 0.0026. f, Western blot of MLC2 levels in Nalm-6 after 
treatment with a6 integrin-neutralizing antibodies. n = 3 independent 
experiments. p-MLC2, phosphorylated-myosin light chain 2 (Thr18/ 
Ser19). For gel source data, see Supplementary Fig. 1. Scale bars, 100 j1m. 


Although ALL cells circulated through and transiently arrested inside 
leptomeningeal vasculature on the day of intravenous engraftment, 
Nalm-6 cells did not breach the leptomeningeal blood-brain barrier 
during this or subsequent post-engraftment time points (Extended Data 
Fig. 7b-d, f). By contrast, Nalm-6 cells rapidly invaded via diapedesis 
through the bone marrow vasculature soon after intravenous injection 
(Extended Data Fig. 7e). Histochemical staining of brain tissue sections 
confirmed that Nalm-6 cells were absent from leptomeningeal tissue 
until late disease stages (Fig. 3d-f and Extended Data Fig. 6d, e). 

The choroid plexus vasculature has been reported as a site of entry 
for normal immune cells during CNS trafficking and for haemato- 
genous metastases of solid tumour cells to the meninges”*”*, We there- 
fore analysed histologic sections of the choroid plexus for the presence 
of Nalm-6 cells at multiple time points after engraftment. Nalm-6 cells 
were very rarely detected within either choroid vessels or choroid tis- 
sue at time points ranging from the day of intravenous engraftment 
to end stage disease (Fig. 3d-f). 


ALL cells enter the CNS along emissary vessels 

We then turned our attention to identifying an alternative route of inva- 
sion. Investigators interested in defining the earliest anatomic location 
of ALL disease in the CNS have studied histologic brain sections of 
patients and found the first identifiable ALL cells to be localized within 
the basement membrane of superficial arachnoid veins”*. Notably, 
arachnoid veins have been described to merge directly with emissary 
veins that pass between the meninges en route to the bone marrow 
via the adjacent skull or vertebral bodies”® (Fig. 4a). We thus hypothe- 
sized that, rather than metastasizing within the circulation, ALL cells 
migrated directly from the bone marrow to the CNS surrounding the 
circulation, along the laminin* ECM of emissary bridging vessels. We 
also hypothesized that directional migration of ALL cells from the bone 
marrow through these bony channels was due to a potent chemoattrac- 
tive stimulus within the CSE Multiple chemokines are indeed present in 
the CSE, and a wide variety of their cognate receptors are expressed by 
ALL cells’’. With the exception of the Notch-driven CCR7* subset of 
T-ALL, however, no specific chemokine receptor appears to be neces- 
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Fig. 3 | ALL cells fail to breach the blood-brain barrier. 

a, b, Representative confocal microscopy images and quantification of 
ALL cells located within brain microvessels or parenchyma at various time 
points after intravenous engraftment. Green, GFP* Nalm-6 cells; red, 
vessels indicated by dextran—Alexa Fluor 647; blue, DiRt primary ALL 
cells. Data are mean of sample means (cells per mouse) +s.e.m.; n =3 
mice per cell line and time point, 12 micrographs per mouse. c, Human (h) 
CD10* ALL cells (arrowhead) detected inside lumen of brain parenchymal 
vessels by immunohistochemistry. Nalm-6: n = 4 mice; primary ALL: 
n=10 mice. d, e, Quantification of Nalm-6-GFP* ALL cells within tissue 


sary or sufficient to drive CNS invasion, suggesting that CSF chemok- 
ines function additively, synergistically or substitutively to attract ALL 
cells?”?8, Consistent with our model, we found that whereas CXCL12 
was one of the most highly concentrated chemokines in the CSF (mean, 
646 pg m7), in vitro invasion of ALL cells towards human CSF is 
only partially inhibited by AMD3100 blockade of the CXCL12 receptor, 
CXCR4”? (Fig. 4b). Moreover, it has previously been shown that in 
vivo blockade of CXCR4 by continuous infusion AMD3100 does not 
decrease development of CNS disease in ALL xenografts”. 

We next sought to identify emissary bridging vessels in mice and 
determine whether ALL cells could be found in transit surrounding 
these, and whether these vessels were indeed laminin’. As shown in 
histologic cross-sections in Fig. 4c and Supplementary Videos 6-11, 
we detected small vessels transiting between the bone marrow and 
subarachnoid space in control mice. In these locations in leukaemic 
mice, we found numerous openings in the vertebral cortical bone that 
were filled with ALL cells, which appeared to be in transit between 
the involved bone marrow and subarachnoid space (Fig. 4c, d and 
Supplementary Videos 8-11). Immunohistochemical staining con- 
firmed that these channels contained oSMAt (vascular smooth muscle) 
and laminin‘ vasculature (Fig. 4e). 

Laminin substantially increases in vitro migration of NSPCs in com- 
parison to other ECM substrates!°. We found that laminin similarly 
enhanced ALL motility in in vitro invasion assays (Extended Data 
Fig. 8a, b). Moreover, ALL migration to human CSF along laminin 
was a6 integrin-dependent and blocked by MLCK inhibition or 
GS-649443 treatment (Fig. 5a and Extended Data Fig. 8b). Providing 
further support that ALL migration utilizes a6 integrin-laminin- 
mediated mechanisms, we found that the frequency of ALL infiltrates 
along the bone marrow-CNS vascular corridors was significantly 
decreased in mice treated with GS-649443 (Fig. 5b). 


58 | NATURE | VOL 560 | 2 AUGUST 2018 


or inside the lumen of blood vessels in the choroid plexus, leptomeninges 
or brain parenchyma at various time points after intravenous engraftment. 
Data are mean of sample means (cells per section per mouse) + s.e.m.; 

day 0, n = 3 mice, 19 sections in total; day 3, n =3 mice, 20 sections; day 
10, n=3 mice, 17 sections; end stage disease, n = 5 mice, 46 sections. 

f, Representative GFP immunohistochemistry staining of choroid plexus, 
leptomeninges and brain parenchyma from Nalm-6-GFP-engrafted mice 
on day 0 after intravenous engraftment (n =3 mice) or at end stage disease 
(n=5 mice). Arrowheads indicate Nalm-6 ALL cells within leptomeninges 
or inside the lumen of brain parenchymal microvessels. Scale bars, 100 1m. 


«6 integrin enables abluminal invasion 

To specifically examine the role of «6 integrin in CNS invasion 
of ALL cells in vivo, Nalm-6-engrafted mice were weekly treated 
with a6 integrin-neutralizing antibodies beginning on day 1 after 
engraftment until attainment of a clinical end point (Extended 
Data Fig. 9a). Although there was no difference in peripheral 
disease burden between targeted and isotype control antibody- 
treated mice, anti-a6 integrin-treated mice showed a modest 
increase in survival, and none developed lower extremity paresis 
by the time of euthanasia (Extended Data Fig. 9b-d). By contrast, 
100% of isotype control-treated mice required euthanasia due to 
paralysis (Fig. 5c). Consistent with this observation, CSF blast counts 
in anti-a6 integrin-treated mice were reduced by approximately one 
third (Fig. 5d). 

To further address the role of «6 integrin in CNS invasion, we studied 
the relationship between a6 integrin expression on ALL cells and the 
incidence of CNS disease in xenograft models and in a retrospective, 
case-control cohort of patients with ALL who had CNS relapse. 
Mice engrafted with ALL cells that express high surface levels of a6 
integrin (>25% cells positive by flow cytometry and 2-3-+ staining 
intensity by immunohistochemistry) displayed a higher frequency of 
CNS disease at clinical end point compared to mice engrafted with 
ALL cells that expressed low levels of a6 integrin (<10% cells posi- 
tive by flow cytometry and 0-1+ staining intensity by immunohisto- 
chemistry; Fig. 5e and Extended Data Fig. 5b, c). In archived bone 
marrow biopsies from patients with ALL who either did or did not 
develop CNS metastases, «6 integrin expression on ALL cells was 
associated with the occurrence of CNS relapse (Fig. 5f, g). Expression 
of a6 integrin in this patient cohort was independent of other var- 
iables considered to increase the risk of CNS disease involvement 
(Extended Data Table 1). 
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Fig. 4 | ALL cells passage to the CNS 
subarachnoid space along the abluminal 
surface of laminin* bone marrow emissary 
vessels. a, Anatomy of emissary vessels in 
calvarial and vertebral bone marrow. b, ALL 

in vitro invasion towards human CSF or 
CXCL12 (also known as SDF1) with or without 
CXCR4 blockade (AMD3100). Data are 

mean + s.e.m.; ANOVA with Tukey test, n =3 
biological replicates, *P < 0.05, **P <0.01, 
82 D < ().0001. c, Bone channel (bracket) 
containing emissary vessels (arrowheads) 
connecting the vertebral bone marrow (BM) 
with subarachnoid space (SA) in histologic 
sections of vertebrae from healthy (n =4 mice) 
and Nalm-6-engrafted (human CD10* cells) 
leukaemic mice (n =3 mice). d, ALL in transit 
to the CNS through bone channels. Vertebrae 
of mice engrafted with primary ALL, Nalm-6 
and RCH-ACV cells are shown. Brackets, 
ALL* bone channels; arrowhead, emissary 
vessels. Nalm-6: n= 27 mice, primary ALL: 
n= 13 mice, RCH-ACV: n= 10 mice. e, aSMA 
(vascular smooth muscle, n = 26 mice) and 
laminin (1 =7 mice) immunohistochemistry 
staining of emissary vessels (indicated by the 
arrowheads). Scale bars, 100 1m. 
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Fig. 5 | ALL cells use «6 integrin-laminin-dependent interactions to 
invade the CNS. a, ALL invasion towards human CSF along laminin- 
collagen matrices after treatment with a6 integrin-neutralizing antibodies, 
the Rho-kinase inhibitor fasudil (at MLCK inhibitory dose), or GS-649443. 
Data are mean + s.e.m.; ANOVA with Tukey, 1 = 3 biological replicates, 
***P < 0.001, ****P < 0.0001. b, Frequency of ALL cell infiltration in bony 
channels of the spine in vehicle- versus GS-649443-treated mice. Data are 
mean + s.e.m.; unpaired two-sided Student's t-test, Nalm-6: n = 3 vehicle- 
treated mice, 4 GS-649443-treated mice, minimum of 24 vertebral sections 
per mouse, P= 0.0198; primary human ALL: n=5 mice per treatment 
group, minimum of 10 vertebral sections per mouse, P= 0.0009; RCH- 
ACV: n= 4 mice per treatment group, minimum of 8 vertebral sections per 
mouse, P= 0.0149. ¢, Incidence of hindlimb paralysis at time of euthanasia. 
d, CNS disease burden (number of CD10* ALL cells) in isotype control 


and 06 integrin-neutralizing antibody-treated Nalm-6-engrafted mice 

at euthanasia for matched time points. Data are mean + s.e.m.; unpaired 
two-sided Student's t-test, n = 3 mice per treatment group, P= 0.0234. 

e, Incidence of CNS disease symptoms (hindlimb paralysis) at clinical end 
point in mice engrafted with ALL cells expressing low versus high levels of 
integrin 6. Fisher's exact test, primary human ALL integrin a6 low, REH 
and RCH-ACV: n=6 mice each; SUP-B15: n=3 mice; Nalm-6: n= 10 
mice, P=0.0008. f, Association between a6 integrin expression in bone 
marrow blasts and CNS disease relapse in patients with ALL. Freeman- 
Halton extension of Fisher’s exact test, no CNS relapse, n = 18 patients, 
CNS relapse, n = 8 patients, P= 0.0282. g, Representative «6 integrin 
immunohistochemistry of bone marrow biopsy samples from patients 
(0-0.5+: n= 17 patients; 1+: n=4 patients; 2+-3+: n=5 patients. 

H, Schematic of ALL CNS invasion model. Scale bars, 50 1m. 
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There are currently no molecularly targeted interventions to prevent 
or treat CNS disease in ALL, reflecting a limited understanding of the 
mechanisms of ALL metastasis to the CNS. Here we show that ALL cells 
use a NSPC embryonic pathfinding mechanism in order to traverse the 
vascular channels that connect the bone marrow and meninges, the 
predominant site of CNS ALL disease in humans'**°. Moreover, we 
show that this process can be intercepted in mouse xenograft models, 
providing a rationale for using clinically available PI3K6 inhibitors to 
prevent CNS disease involvement in patients. 

Leukaemic cells have previously been shown to hijack haematopoietic 
stem cell trafficking mechanisms to metastasize within the bone marrow 
microenvironment". Our data demonstrate that this pattern of molec- 
ular plagiarism by ALL cells extends across organ systems, potentially 
to exploit a path of least resistance into the CNS. By travelling along the 
external (abluminal) surface of vessels that are topologically contiguous 
with the CNS subarachnoid space, ALL cells migrate directly from the 
bone marrow to the CNS, bypassing the need to enter and exit the 
CNS vasculature (Fig. 5h). In so doing, ALL cells phenocopy a6 integ- 
rin-dependent mechanisms used by NSPCs to migrate to the olfactory 
bulb along the ECM of vessels*. In future studies, it will be important 
to examine whether ALL cells similarly utilize other integrin-ECM 
interactions that drive neuronal pathfinding and developmental organ- 
ization in order to the metastasize to the CNS*”. 

Although the CNS is widely considered to be an immune-privileged 
site, the recent discovery in mice ofa lymphatic system in the dura mater 
that drains macromolecules and immune cells from the brain suggests 
that other unrecognized pathways might permit communication 
between the periphery and the CNS**. Future studies may reveal that 
this unique ALL trafficking pathway is involved in immune surveillance 
or inflammatory processes. Exploring the interactions between normal 
and malignant immune cells and these vascular scaffolds may thus 
reveal multiple points of intervention to treat CNS invasive processes. 
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METHODS 


Data reporting. No statistical methods were used to predetermine sample size. 
The experiments were not randomized. 

Collection of bone marrow, spleen and CSF cells. Mouse femurs were collected 
and bone marrow cells were aspirated with RPMI 1640 medium containing 10% 
fetal bovine serum (FBS). Mouse spleen tissues were homogenized in RPMI 1640 
medium containing 10% FBS using a 15-ml syringe. The spine was removed from 
mice and carefully separated into individual vertebral bodies by cutting through 
the intervertebral discs. The spinal cord and meninges of each vertebral body were 
then washed with RPMI 1640 containing 10% FBS to collect cells from the CSE. 
The bone marrow, spleen and CSF cells were passed through 70-|.m filters, washed 
with phosphate-buffered saline (PBS) and treated with ACK lysis buffer to remove 
red blood cells. 

Flow cytometry analysis. For in vitro cell surface receptor staining, 1 x 10° cells 
per 100,11 were stained with antibody for 30 min at 4°C in Automacs buffer (BD 
Biosciences) containing 3% BSA, washed with 3% BSA, and analysed on a FACS 
Canto II cytometer. For apoptosis analysis, cells were resuspended in Annexin V 
Binding Buffer (BD Biosciences) and stained with anti-annexin V-AF647 antibody 
and propidium iodide for 15 min at room temperature and analysed. For cell cycle 
analysis, cells were first stained for CD10 and then fixed in 2% paraformaldehyde 
for 30 min at 4°C, permeabilized in 80% ethanol overnight at —20°C, washed and 
resuspended in Automacs buffer containing 3% BSA and then incubated with 
anti-human-KI-67 for 1h at 4°C before washing with 3% BSA before analysis. 
In some experiments, cell cycle analysis was performed by staining 1 x 10° cells 
per 1001 Automacs buffer containing 1% FBS with 4g ml~' Hoechst 34580 
(Invitrogen) for 30 min at room temperature. An example of our gating strategy is 
shown in Supplementary Fig. 2. 

Labelling of cells using fluorescent dyes. Cells were fluorescently labelled through 
incubation with ‘DiR lipophilic dyes (DiIlC1 8(7) (1,1'-dioctadecyl-3,3,3’,3’- 
tetramethylindotricarbocyanine iodide, Invitrogen) as previously described”. In 
brief, cells were isolated at a density of 2.5 x 10° cells per ml and resuspended in 
251M DiR in culture medium containing 10% FBS. Cells were protected from 
the light and rotated at room temperature for 30 min. After incubation, cells were 
washed twice in PBS, counted and resuspended at desired concentrations for 
engraftment. Dye labelling efficiency was tested using flow cytometry. 

Mouse engraftment. Specific pathogen-free 6- to 8-week-old male and female 
SCID mice (Charles River) were inoculated intravenously with 5 x 10° Nalm-6- 
GFP, RCH-ACV or REH cells in PBS through the tail vein. Similarly, primary ALL 
or SUP-B15 cells were engrafted into 6- to 8-week-old male and female NOD/ 
SCID/IL-2Ry (NSG) mice (Charles River) by intravenous injection. All experi- 
mental procedures involving mice were approved by the Animal Care and Use 
Committee of Duke University. The Institutional Review Board of Duke University 
approved the use of de-identified primary human ALL cells in this study (Protocol 
00006268, 19-April-08). All experiments were performed in accordance with the 
relevant ethical guidelines and regulations. 

Administration of GS-649443. GS-649443 or vehicle (Gilead Sciences, Inc.) was 
administrated to Nalm-6-GFP, RCH-ACV, REH, SUP-B15 or primary ALL cell- 
engrafted mice by oral gavage from day 1 or day 20 after engraftment. Mice were 
dosed twice a day with 2mg kg—1 GS-649443. Mice were monitored daily and 
euthanized when they showed hind limb paralysis or other observable CNS symp- 
toms, severe cachexia (>20% weight loss), respiratory or other distress, or extreme 
lethargy. For paired-animal analysis, mice from the vehicle- and GS-649443-treated 
groups were paired at the beginning of the study. Both mice in each pair were 
euthanized when either of them showed any of the symptoms outlined above. 
Administration of «6 integrin-neutralizing antibodies. Anti-human 06 
integrin-neutralizing antibodies (Invitrogen) or isotype control antibodies (rat 
gamma globulin) (Jackson ImmunoResearch) were administrated intravenously 
in Nalm-6-GFP cell-engrafted mice. The mice were dosed with antibodies weekly 
at 3mg kg” |. Mice were monitored daily and euthanized at the clinical end points 
described above. 

Cell lines and culture. The Nalm-6 (ACC-128) and RCH-ACV (ACC-548) cell 
lines were purchased from DSMZ. REH (CRL-8286) and SUP-B15 (CRL-1929) cell 
lines were purchased from ATCC. STR profiling was repeated routinely to authen- 
ticate cell lines. Mycoplasma testing by PCR was performed every three months 
on all cell lines in culture. All mycoplasma testing by PCR was negative during the 
entire duration of this study. The GFP-expressing clone of Nalm-6 was generated 
as described previously". Nalm-6, REH and RCH-ACV cells were cultured in 
RPMI 1640 (Corning Inc.) supplemented with 10% FBS (Gemini Bio-Products). 
Sup-B15 cells were cultured in Iscove’s modified Dulbecco’s medium (IMDM), 
supplemented with 20% FBS and 0.05 mM 2-mercaptoethanol. All cultures were 
maintained at 37°C in a 5% CO humidified atmosphere. 

Antibodies for flow cytometry. The following antibodies were purchased 
from BD Pharmingen: an allophycocyanin (APC)-conjugated mouse antibody 
against human CD10, peridium chlorophyll protein complex with cyanin-5.5 
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(PerCP-Cy5.5)-conjugated mouse antibody against human CD10, phycoerythrin 
(PE)-conjugated rat antibodies against human CXCR4, APC-conjugated mouse 
antibody against human CXCR3, APC-conjugated mouse antibody against human 
CD44, PE-conjugated mouse antibodies against human CD24, APC-conjugated rat 
antibody against mouse CD45, PE-conjugated rat antibody against human CD49f, 
Alexa Fluor647-conjugated annexin-V antibody, PE-conjugated KI-67 antibody 
and PE-conjugated mouse antibody against human CD19. 

Measurement of GS-649443 concentration in plasma and CNS. Brain tissues 
were collected after perfusion of mice with saline. The thoracic cavity of the mouse 
was opened and saline was perfused at a rate of 5 ml min”! with a 23G needle 
through the left cardiac ventricle for 6 min. Collected brain tissues were homog- 
enized, diluted three times and treated with acetonitrile. Plasma samples were 
diluted twice and treated with acetonitrile. The above mixtures were centrifuged 
at 4,275 r.p.m. for 15 min and their supernatants were transferred to a 96-well plate 
for analysis using the API 5000 triple quadrupole mass spectrometer (AB Sciex). 
Transwell migration assay. Migration assays were performed using 24-well plates 
containing 0.3% FBS in medium. Nalm-6-GFP cells were treated with vehicle, 
GS-649443 (100nM), AZD5363 (10j1M) or AMD3100 (5p1g ml!) for 20min in 
serum-free medium and seeded on an uncoated polycarbonate membrane insert 
(6.5mm in diameter with 8.0-1m pores) in a Transwell apparatus (Corning). The 
lower chamber of the Transwell plate was loaded with 600 jl of medium with 0.3% 
serum alone or CXCL12 (100ng ml~!) as a positive control. After incubation for 
3h at 37°C, the inserts were removed and cells that had migrated to the bottom 
were counted using a haemocytometer. 

Western blot analysis. Following treatment with GS-649443 or anti-a6 integrin, 
more than 1 x 10° cells were collected by centrifugation and washed with 
ice-cold PBS. Ice-cold RIPA buffer (50 mM Tris-HCl (pH 7.4), 0.25 M NaCl, 5mM 
EDTA, 20mM Naf, 1% NP-40) containing fresh protease inhibitor cocktail (Sigma- 
Aldrich) and phosphatase inhibitor cocktails 2 and 3 (Sigma-Aldrich) was added 
to the cells. The suspension was transferred into a centrifuge tube and placed on 
ice for 3 min. The cell suspension was cleared by centrifugation at 14,000g for 
3 min at 4°C. The supernatants (total cell lysate) were used immediately or stored 
at —80°C. Protein concentrations were determined using the DC Protein Assay 
(Bio-Rad Laboratories). Samples (20g of protein) were analysed using the fol- 
lowing primary antibodies, as indicated: anti-phosphorylated-myosin light chain 
2 (Thr18/Ser19) (Cell Signaling Technology), anti-myosin light chain 2 (Cell 
Signaling Technology), anti-B-actin (Abcam). Horseradish peroxidase (HRP)- 
coupled rabbit IgG (Cell Signaling Technology) and goat anti-mouse IgG-HRP 
(ThermoFisher) were used as secondary antibodies, and immunoreactive proteins 
were detected by enhanced chemiluminescence (ECL) (ThermoFisher). 
Three-dimensional invasion assays. Cells in 0.3% serum were treated with 
GS-649443 (100nM) for 16h, or cultured in 0.3% serum overnight before the 
addition of anti-human 06 integrin-neutralizing antibodies for 3h (Invitrogen, 
40 pg ml~!) or AMD3100 (5 Lg, ml!) for 20 min. Cells were resuspended in 
serum-free rat-tail collagen (Advanced Biomatrix 5153 at 3mg ml!) alone or 
supplemented with laminin (Sigma-Aldrich, L2020) or fibronectin (Sigma-Aldrich, 
F1141). Wherever unspecified, cells were resuspended in collagen and laminin 
(0.01 mg ml~!). Resuspended cells were aliquoted into 96-well plates and spun 
down to the bottom of the plate. Collagen was allowed to polymerize for 2h and 
cell culture medium or human CSF (LEE Biosolutions, 991-19-P) was added on top 
of the gel as a chemoattractant. After 16h of incubation at 37 °C, plates were fixed 
and stained with 5 1g ml! Hoechst 33258 (Molecular Probes-Life Technologies). 
Plates were imaged on a Zeiss 780 inverted confocal microscope (Carl Zeiss, 
Germany) with Zen software. The three-dimensional migration index was cal- 
culated as number of invading cells at 501m divided by the total number of cells. 
Immunohistochemistry. CD10, «SMA, laminin, «6 integrin and GFP immuno- 
histochemistry staining were performed under the following conditions. CD10: 
antigen retrieval was performed using EDTA Target Retrieval solution (DAKO). 
Slides were stained with anti-CD10 (clone EPR5904-110; Abcam) at 1:4,000. 
DAKO Envision System-HRP (DAB), for use with rabbit (K4011; DAKO), was 
used as a secondary antibody. aSMA: no antigen retrieval was performed. Slides 
were stained with anti-6-actin, «-smooth muscle actin (clone 1A4; Sigma-Aldrich) 
at a concentration of 31g ml~'. M.O.M. Biotinylated Anti-Mouse IgG Reagent, 
for use with mouse (PK-2200; Vector Laboratories), was used as a secondary anti- 
body. Laminin: antigen retrieval was performed using Citrate Target Retrieval 
solution (DAKO). Slides were stained with anti-laminin (clone ab11575; Abcam) 
at a concentration of 5,.g ml~!. DAKO Envision System-HRP (DAB), for use with 
rabbit (K4011; DAKO), was used as a secondary antibody. 6 integrin: Antigen 
retrieval was performed using EDTA Target Retrieval solution (DAKO). Slides 
were stained with anti-integrin «6 (clone EPR18124; Abcam) at a concentration 
of 641g ml-!, DAKO Envision System-HRP (DAB), for use with rabbit (K4011; 
DAKO), was used as a secondary antibody. GFP: slides were stained with anti-GFP 
(NB600-308; NOVUS Biologicals) at a concentration of 1 gml~’. DAKO Envision 
System-HRP (DAB), for use with rabbit (K4011; DAKO), was used as a secondary 
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antibody. Nalm-6-GFP* ALL cell counts in individual GFP-stained brain sections 
were performed by a trained haematopathologist (C.M.M.). Scoring of ALL cell 06 
integrin expression in mouse bone marrow specimens was performed in a blinded 
fashion by a trained hematopathologist (C.M.M.). 

Three-dimensional reconstructions. Three-dimensional reconstructions of emis- 
sary vessels were acquired from 30-\1m sections of the spinal cord stained with 
eosin and haematoxylin. Consecutive images were acquired at regular intervals 
and assembled into movies using VirtualDub64 software. 

Quantification of vertebral bone channels containing ALL cells. First, 5-\1m 
sections of the spinal cord were stained with eosin and haematoxylin. Then, ver- 
tebral bone channels that contained ALL cells were identified and counted in each 
intact spinal section. This analysis was performed in a blinded fashion by two 
independent laboratory members. 

Human samples. The study design and use of clinical bone marrow biopsy sam- 
ples was approved by the Institutional Review Board of Duke University (protocol 
57532). A waiver of informed consent was granted as the research was considered 
to present no more than minimal risk of harm to participants. Electronic medical 
records of the Duke University Health System from 2009 to 2016 were searched 
using the Duke Enterprise Data Unified Content Explorer (DEDUCE) online query 
system in order to identify patients of any age with ALL diagnoses. The medical 
chart of each patient was then reviewed to confirm the pathologic diagnosis as 
well as the availability of archived diagnostic and/or relapse bone marrow biopsy 
specimens. Patients with ALL with CNS relapse confirmed through identification 
of malignant cells in the CSF by cytology and/or immunophenotyping were retro- 
spectively enrolled. Patients with ALL who did not undergo bone marrow trans- 
plantation and who were without relapse in the bone marrow or CNS after more 
than 5 years of follow-up served as control patients. Patients with ALL who did 
not undergo bone marrow transplantation and who relapsed in the bone marrow 
but had no clinical evidence of CNS relapse and confirmed negative CSF cytology/ 
immunophenotyping served as additional control patients. Scoring of «6 integrin 
expression in ALL cells in imunohistochemically stained bone marrow biopsies 
was performed by a trained haematopathologist. 

Intravital microscopy. SCID mice were anaesthetized using isoflurane and 
a rectangular incision was made in the scalp, revealing the intact, underlying 
cortical bone. The region was washed with PBS and fluorescently labelled dextran 
(Dex-FITC or Dex-AF647) was administered via tail vein injection to highlight 
the vasculature. Dex-AF647 is a large molecular weight dextran molecule conju- 
gated to the fluorophore Alexa Fluor647. Because it has a prolonged blood-pool 
half-life, it is a useful means of visualizing the vasculature in in vivo studies. Mice 
were placed in a specially designed restrictor and a cover slip was placed over the 
exposed calvarial bone. Mice remained anaesthetized throughout the procedure. 
High-resolution images were obtained through the intact mouse skull using a Leica 
SP5 confocal and multiphoton microscope with a 20 x/0.40 numerical aperture 
(NA) objective lens. The system utilizes a femtosecond titanium:sapphire laser 
(Chameleon) for multi-photon or single-photon excitation and multiple Cs lasers 
(including an Argon laser, a HeNe laser and 561- and 633-nm diode lasers) for 
single-photon excitation. Images were captured using Leica LAS-AF software 
using line and frame averaging. The calvarial bone marrow was subdivided into 
numbered anatomical areas and overlapping 20 x images of the entire region were 
captured. After the procedure, these images were merged using Photoshop v.12.0.4 
to generate a montage image of the entire calvarium. Cells were then enumerated 
using the cell counter function in Image]. 

Skull thinning. SCID mice were anaesthetized using isoflurane and the corti- 
cal bone was exposed following a sagittal incision in the scalp. The head of the 
mouse was secured using a stereotactic apparatus. A drill (Hager and Meisinger 
GMBH size 1007, through Fine Surgical Tools) was used to thin the skull, creating 
a small, circular bowl-like area with a flat bottom. A microsurgical blade was used 
to carefully complete the thinning by hand (Surgistar, 38-6961). High-resolution 
images of the leptomeninges were obtained through the thinned skull window 
using a Leica SP5 confocal and multi-photon microscope with a 20x/0.40 NA 
objective lens. 


Confocal microscopy brain whole-mount imaging. SCID mice were engrafted 
with 5 x 10° DiR-labelled cells and confocal imaging of the brain was performed 
at indicated time points after engraftment (day 0, 2h after engraftment; day 3; day 
10). Mice were euthanized at imaging time points, the brain was removed and 
then cut sagittally along the midline into thin sections for whole-mount imaging. 
High-resolution images of ALL cells in brain parenchymal spaces was obtained 
using a Leica SP5 confocal and multi-photon microscope with a 20 x/0.40 NA 
objective lens. The system uses a femtosecond titanium:sapphire laser (Chameleon) 
for multi-photon or single-photon excitation and multiple Cs lasers (including an 
argon laser, a HeNe laser, and 561- and 633-nm diode lasers) for single-photon 
excitation. Images were captured with Leica LAS-AF software. 

Microarray analysis. Total RNA isolated from bone marrow and CSF was sent for 
gene expression profiling using the Clariom D Human microarrays (Affymetrix) at 
the Duke Center for Genomic and Computational Biology. Raw cell intensity data 
were imported in to the Expression Console (Affymetrix) and bone marrow arrays 
(vehicle and GS-649443 samples) and CSF arrays (vehicle and GS-649443) were 
normalized using the RMA algorithm. Gene level differential expression analysis 
was performed using the Transcript Analysis Console (v.3.1.0.5; Affymetrix) and 
ANOVA statistical analyses were performed on the bone marrow and CSF arrays 
separately. Transcripts identified to be differentially regulated in the bone marrow or 
CSF samples by +1.5-fold with P < 0.05 were carried forward for pathway analysis. 
Statistics and reproducibility. Microarray analysis was performed as described 
above. Kaplan-Meier curves with two-sided log rank Mantel-Cox analysis were 
used to assess in vivo survival. Two-sided paired Student's t-tests were used for 
analysis of vehicle versus inhibitor effects on tumour burden in mice, ALL bone 
marrow migration in mice, and GS-649443 effects on «6 integrin expression in 
ALL cells in vitro. One-way ANOVA analyses with Tukey post hoc testing for 
multiple comparisons were performed for invasion assays, transwell migration 
assays and comparison of tumour burden across bone marrow, spleen and CNS in 
vehicle versus inhibitor-treated mice. Fisher's exact tests and the Freeman-Halton 
extension of the Fisher’s exact test were used to compare «6 integrin staining levels 
in human biopsies and on CNS symptoms incidence levels. Nalm-6 migration 
along calvarial bone marrow vessels was quantified using Image J software (NIH). 
The majority of statistical analysis was performed using GraphPad Prism version 
7.0, while the VassarStats statistical computation website was used to perform 
the Freeman-Halton extension of the Fisher’s exact test. Significant P values 
were defined as follows: NS, not significant; *P < 0.05; **P< 0.01; ***P< 0.001; 
w+ P< 0.0001. Precise P values for data shown in the main figures using these 
ranges are as follows. Figure le: Nalm-6 xenografts: P= 0.8142 (bone marrow), 
P=0.1014 (spleen). Figure 1g: 1° ALL xenografts: P= 0.0052 (CNS), P= 0.0266 
(bone marrow), 0.0222 (spleen); RCH-ACV xenografts: P= 0.0067 (CNS), 
P=0.0981 (bone marrow), P= 0.2465 (spleen). Figure 2b: P= 0.9657 (vehicle 
versus AZD5363), P=0.0035 (vehicle versus idelalisib), P= 0.0040 (vehicle 
versus GS-649443). Figure 2c: P=0.1337 (vehicle versus GS-649443). Figure 4b: 
P<0.0001 (control versus CSF), P= 0.0015 (CSF versus CSF and AMD3100), 
P=0.0201 (CSF and AMD3100 versus SDF1 and AMD3100), P< 0.0001 (SDF1 
versus SDF1 and AMD3100); Fig. 5a: primary ALL: P< 0.0001 (no CSF ver- 
sus CSF), P=0.0001 (CSF versus GS-649443), P=0.0003 (CSF versus fasudil), 
P=0.0003 (CSF versus anti-«6 integrin), Nalm-6: P< 0.0001 (no CSF versus CSF), 
P<0.0001 (CSF versus GS-649443), P< 0.0001 (CSF versus fasudil), P< 0.0001 
(CSF versus anti-a6 integrin). Data are mean +s.e.m. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. Source Data for quantifications mentioned in the text or shown 
in the graphs plotted in Figs. 1-5 and Extended Data Figs. 1-9 are available 
in the online version of this paper. Full scans of western blots are provided in 
Supplementary Fig. 1. Microarray gene expression data that support the findings of 
this study have been deposited in Gene Expression Omnibus (GEO) under acces- 
sion number GSE114627. The authors declare that the data supporting the findings 
of this study are available within the paper and its Supplementary Information, as 
Source Data or from the corresponding author upon reasonable request. 
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Extended Data Fig. 1 | See next page for caption. 


© 2018 Springer Nature Limited. All rights reserved. 


ARTICLE 


Extended Data Fig. 1 | PI3K8 inhibition minimally decreases ALL 
peripheral disease burden. a, Representative haematoxylin and eosin 
(H&E)-stained histologic sections showing Nalm-6 disease burden in 
the femoral bone marrow and spleen of mice treated with vehicle or 
GS-649443, at matched time points. m= 6 mice per treatment group. 

b, Complete blood counts of vehicle- and GS-649443-treated mice, at 
matched time points. Paired two-tailed Student's t-test, n =5 mice per 
treatment group, P= 0.2158; ns, not significant. c, Fold changes in CNS 
versus systemic disease burden in vehicle- and GS-649443-treated 
Nalm-6-engrafted mice. Data are mean + s.e.m.; ANOVA with Tukey’s 
multiple comparison test, n= 5 mice per treatment group, P= 0.0314. 
d, Representative histologic sections showing primary ALL disease burden 
in the femoral bone marrow and spleen of mice treated with vehicle or 
GS-649443 at matched time points. n =5 mice per treatment group. 


e, Experimental plan for mice with primary ALL. f, Fold changes in 

CNS versus systemic disease burden in vehicle- and GS-649443-treated 
mice with primary ALL. Data are mean + s.e.m.; ANOVA with Tukey, 

n=5 mice per treatment group, P= 0.00128 (CNS versus bone marrow), 
P=0.0098 (CNS versus spleen). g, Representative histologic sections 
showing RCH-ACV disease burden in the femoral bone marrow and 
spleen of mice treated with vehicle or GS-649443. n =4 mice per treatment 
group. Arrowheads indicate RCH-ACYV blasts. h, Kaplan-Meier survival 
curve for mice injected with RCH-ACV cells. Two-sided log rank Mantel- 
Cox, n=5 mice per treatment group, P= 0.0067. i, Fold changes in CNS 
versus systemic disease burden in vehicle- and GS-649443-treated mice 
injected with RCH-ACV cells. Data are mean + s.e.m.; ANOVA with 
Tukey, n= 4 mice per treatment group, P= 0.0244. Scale bars, 100 1m. 
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Extended Data Fig. 2 | PI3K8 inhibition at levels achievable in mouse 
CSF does not affect the apoptotic or cell cycle index of Nalm-6 cells; 
ROCK inhibition does not alter survival or disease burden in Nalm-6 
leukaemic mice. a, Serum and brain tissue concentrations of GS-649443 
in healthy SCID mice. Data are mean + s.e.m.; n= 5 mice per group. 

b, Effect of PI3K6 and Akt inhibition on transwell migration of primary 
ALL cells. Data are mean + s.e.m.; ANOVA with Tukey, n = 3 technical 
replicates per group, P= 0.5228 (vehicle versus AZD5363), P= 0.0549 
(vehicle versus idelalisib), P= 0.0245 (vehicle versus GS-649443). 

c, Percentage of annexin-V* Nalm-6 cells following in vitro treatment 
(72h) with vehicle or GS-649443. Data are mean + s.e.m.; unpaired two- 
sided Student's t-test, n = 3 biological replicates, P= 0.2038. d, Proportion 
of Nalm-6 cells in the G1, S or G2/M phase of the cell cycle following in 
vitro treatment (72h) with vehicle or GS-649443. Data are mean + s.e.m.; 
unpaired two-sided Student's t-test, n = 3 biological replicates, P= 0.2216 
(G1), P=0.1405 (S), P=0.0661 (G2/M). e, Percentage of annexin-Vt 


cells in the CD10* fraction of the CSF in vehicle- or GS-649443-treated 
mice. Paired two-sided Student's t-test, n =5 mice per treatment group, 
P=0.6672. f, Percentage of cells in G1 in the CD10* fraction of the CSF 
in vehicle- or GS-649443 treated mice. Paired two-sided Student's t-test, 
n=5 mice per treatment group, P= 0.1477. g, Percentage cells in S or 
G2/M phase in the CD10* fraction of the CSF in vehicle- or GS-649443- 
treated mice. Paired two-sided Student’s t-test, n =5 mice per treatment 
group, P= 0.5687. h, Kaplan-Meier survival curves for Nalm-6-engrafted 
mice treated with or without the Rho-kinase (ROCK) inhibitor fasudil. 
Two-sided log rank Mantel-Cox, P = 0.2843; n= 6 per treatment group. 
i, Disease burden (CD10* cells) at end point in the bone marrow, spleen 
and CSF of vehicle- and fasudil-treated SCID mice engrafted with Nalm-6 
cells. Data are mean + s.e.m.; paired two-sided Student's t-test, n = 4 mice 
per treatment group, P= 0.4534 (bone marrow), P=0.3119 (spleen), 
P=0.8026 (CSE). 
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cells. a, Western blot of MLC2 levels in ALL cells following GS-649443 marrow migration over time in vehicle- versus GS-649443-treated mice. 
treatment. For gel source data, see Supplementary Fig. 1. Nalm-6 cells: n=3 mice per treatment group. Scale bars, 250 jim. 
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Extended Data Fig. 4 | Microarray analysis of focal adhesion and 
motility pathway genes in blasts from vehicle- versus GS-649443- 
treated mice and validation of key candidates. a, b, Differentially 
expressed focal adhesion and motility pathway genes identified by 


microarray analysis of blasts from vehicle- versus GS-649443-treated mice. 


n=6 mice per treatment group. c, Microarray negative control validation. 
Representative flow cytometry analysis of Nalm-6 expression of CXKCR4, 
CXCR3, CD44 and CD24 following treatment with vehicle or GS-649443. 
n=3 independent experiments. d, ALL a6 integrin expression in vivo by 
flow cytometry. Data are mean +s.e.m.; n= 4 mice. 
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Extended Data Fig. 5 | Cell surface expression of «6 integrin is 
variable across a panel of ALL cell lines and primary human ALL 
cells in vitro and in vivo. a, Flow cytometry analysis of the percentage 
of «6 integrin* populations in ALL cell lines and primary ALL cells. 
Data are mean + s.e.m.; n = 3 independent experiments per cell line. 
b, Representative flow cytometry histograms of «6 integrin expression 
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in ALL cell lines and primary ALL cells. n = 3 independent experiments. 

c, Representative «6 integrin immunohistochemistry of femoral bone 
marrow from mice engrafted with low a6 integrin-expressing versus high 
a6 integrin-expressing ALL cells. «6 integrin low primary human ALL: 
n=6 mice; REH: n=6 mice; SUP-B15: n= 3 mice; RCH-ACV: n=6 mice; 
Nalm-6: n= 10 mice. 
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Extended Data Fig. 6 | ALL cells do not invade through brain 
microvasculature. a, Representative confocal microscopy images of 

ALL cells located within brain microvessels at various time points after 
intravenous engraftment. n =3 mice per cell line and time point, 12 
micrographs per mouse. White boxes outline areas shown at higher 
magnification in Fig. 3a. b, c, Quantification of the number of ALL 

cells located in brain parenchymal tissue or within brain parenchymal 
microvessels at various time points after intravenous engraftment of RCH- 
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replicates per time point. d, Cartoon showing vasculature within the 
choroid plexus, leptomeninges and brain parenchyma in humans. 


e, Representative GFP immunohistochemistry st: 


aining of brain 


parenchyma of Nalm-6-GFP-engrafted mice on day 0 after intravenous 
engraftment (n = 3 mice) or at end stage disease (n = 5 mice). Images show 
close-ups of the ALL cells that are highlighted by arrowheads in Fig. 3f. 


Scale bars, 100 um. 
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Extended Data Fig. 7 | See next page for caption. 
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Extended Data Fig. 7 | Nalm-6 cells do not invade via diapedesis 
through the leptomeningeal blood-brain barrier. a, Graphic of thinned 
skull window and video-rate intravital confocal microscopy approach 
used to image the leptomeningeal vasculature at various time points after 
engraftment with ALL cells. b, Representative still images from video-rate 
intravital microscopy analysis of leptomeningeal and superficial cerebral 
vasculature (red) at 10 min after intravenous Nalm-6 engraftment (full 
video presented in Supplementary Videos 1, 2). Nalm-6 cells (green) are 
observed in circulation. n = 12 mice imaged on day 0 after engraftment. 

c, Series of still images of leptomeningeal and superficial cerebral 
vasculature at 45 min after intravenous Nalm-6 engraftment. A Nalm-6 cell 
is observed adherent to the luminal side of a leptomeningeal vessel (white 
arrowhead). A second Nalm-6 cell is observed rolling along the luminal 
wall of a leptomeningeal vessel (blue arrowhead) (see Supplementary 
Video 3). No invasion via diapedesis was observed during the entirety of 
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each 2-4-h long imaging session on the day of engraftment. n = 12 mice 
imaged on day 0 after engraftment. d, Still images of the leptomeningeal 
and superficial cerebral vasculature 12 days after Nalm-6 engraftment. No 
Nalm-6 cells were observed in circulation or within the leptomeningeal 
tissue (see Supplementary Video 4; n = 4 mice, days 2, 4, 7 and 12 

after engraftment). e, Representative images from intravital confocal 
microscopy of the calvarial bone marrow at 2h after Nalm-6 engraftment. 
Numerous Nalm-6 cells (white arrowheads) are seen to have invaded 
through the bone marrow vasculature soon after intravenous engraftment. 
n= 15 mice. f, Series of still images of the z plane of the leptomeningeal 
and superficial cerebral vasculature of a leukaemic mouse engrafted 

with Nalm-6 cells at disease end point. n =7 mice, days 37-39 after 
engraftment. Nalm-6 cells are observed in circulation (white arrowheads), 
but no cells are observed to invade (see Supplementary Video 5). 
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Extended Data Fig. 8 | ALL cell invasion along laminin matrices is 
regulated by PI3K signalling and «6 integrin. a, Primarly ALL and 
Nalm-6 in vitro invasion towards human CSF along collagen, collagen 
and laminin, or collagen and fibronectin matrices. Data are mean + s.e.m.; 
ANOVA with Tukey, n = 3 biologically independent experiments; 
primary ALL: P= 0.8937 (collagen versus 0.001 mg ml! laminin), 
P=0.0020 (collagen versus 0.005 mg ml! laminin), P < 0.0001 (collagen 
versus 0.01 mg ml! laminin): P = 0.0604 (0.01 mg ml‘ laminin versus 
fibronectin); Nalm-6: P= 0.0012 (collagen versus 0.01 mg ml“! laminin) 
P=0.0150 (0.01 mg ml! laminin versus fibronectin). b, Comparative 

in vitro invasion of low a6 integrin-expressing ALL cells versus high 

a6 integrin-expressing ALL cells towards human CSF along laminin 
matrices and effects of PI3K6 inhibition or «6 integrin blockade. Data 
are mean + s.e.m.; ANOVA with Tukey, n = 3 biologically independent 
experiments; REH: P > 0.9999 (collagen versus laminin), P= 0.6967 
(collagen and laminin versus collagen and laminin and GS-649443), 

P> 0.9999 (collagen and laminin versus collagen and laminin and 
anti-integrin-«6); primary ALL: P= 0.9758 (collagen versus laminin), 
P=0.9974 (collagen and laminin versus collagen and laminin and 
GS-649443), P= 0.9993 (collagen and laminin versus collagen 

and laminin and anti-integrin-a6); RCH: P= 0.2446 (collagen versus 
laminin), P=0.0079 (collagen and laminin versus collagen and laminin 
and GS-649443), P= 0.2549 (collagen and laminin versus collagen and 
laminin and anti-integrin-«.6); SUP-B15: P=0.0122 (collagen versus 
laminin), P= 0.0080 (collagen and laminin versus collagen and laminin 
and GS-649443), P=0.0112 (collagen and laminin versus collagen and 
laminin and anti-integrin-06). *P < 0.05, **P < 0.01, ***P< 0.001. 
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Extended Data Fig. 9 | Anti-c.6 integrin-blocking antibody treatment 
prolongs survival, but does not alter disease burden in the bone 
marrow or spleen of Nalm-6-engrafted leukaemic mice. a, Schematic 
for the treatment of mice with the a6 integrin-neutralizing antibody. 

b, Kaplan-Meier survival curves for Nalm-6-engrafted mice treated with 
or without anti-integrin-a6 blocking antibodies. Two-sided log rank 


Vehicle 


Hey 
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Mantel-Cox, n = 3 mice per treatment group, P= 0.0224. c, d, Disease 
burden at end point in the bone marrow and spleen of vehicle and anti- 
integrin-a6 antibody-treated Nalm-6-engrafted leukaemic mice. Data are 
mean +s.e.m.; paired two-sided Student's t-test, n = 3 mice per treatment 
group, P= 0.7874 (bone marrow), P= 0.1595 (spleen). 
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Extended Data Table 1 | «6 integrin expression is associated with CNS relapse independently of other known risk factors for CNS disease 
recurrence 
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Summary of clinical data for patients with ALL. Statistical testing was performed to determine potential associations between a6 integrin expression and known risk factors for CNS disease relapse. 
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Topological negative refraction of surface acoustic 
waves in a Weyl phononic crystal 


Hailong He!, Chunyin Qiu, Liping Ye!, Xiangxi Cai!, Xiying Fan!, Manzhu Ke!, Fan Zhang? & Zhengyou Liu!?* 


Reflection and refraction of waves occur at the interface between two 
different media. These two fundamental interfacial wave phenomena 
form the basis of fabricating various wave components, such as optical 
lenses. Classical refraction—now referred to as positive refraction— 
causes the transmitted wave to appear on the opposite side of the 
interface normal compared to the incident wave. By contrast, 
negative refraction results in the transmitted wave emerging on the 
same side of the interface normal. It has been observed in artificial 
materials!~, following its theoretical prediction®, and has stimulated 
many applications including super-resolution imaging’. In general, 
reflection is inevitable during the refraction process, but this is often 
undesirable in designing wave functional devices. Here we report 
negative refraction of topological surface waves hosted by a Weyl 
phononic crystal—an acoustic analogue of the recently discovered 
Weyl semimetals*'”. The interfaces at which this topological negative 
refraction occurs are one-dimensional edges separating different 
facets of the crystal. By tailoring the surface terminations of the Weyl 
phononic crystal, constant-frequency contours of surface acoustic 
waves can be designed to produce negative refraction at certain 
interfaces, while positive refraction is realized at different interfaces 
within the same sample. In contrast to the more familiar behaviour 
of waves at interfaces, unwanted reflection can be prevented in our 
crystal, owing to the open nature of the constant-frequency contours, 
which is a hallmark of the topologically protected surface states 
in Weyl crystals®!. 

Weyl semimetals*"!”, which feature doubly degenerate, linear band 
crossing points in three-dimensional momentum space, have become 
a research focus in the field of topological matter!*!*, Weyl points 
are monopoles of Berry flux with topological charges characterized 
by quantized Chern numbers. In addition to the bulk properties that 
manifest the chiral anomaly of Weyl semimetals”’>’*, their non- 
trivial band topology means that their boundaries host one-way gapless 
surface states®!%!!. In particular, the surface band dispersion at the 
Fermi energy can form open arcs that connect projected Weyl] points of 
opposite topological charge. Weyl physics has recently been extended to 
artificial structures for classical waves, such as photonic and phononic 
crystals!”~?, in which materials with different optical or acoustic prop- 
erties are arranged periodically. Soon after the experimental discovery 
of photonic Wey] points in a double-gyroid structure'®, the associated 
surface arc states were observed successfully!?2!>, In particular, more 
controllable structure design and less demanding signal detection have 
enabled macroscopic classical systems to be used***° to explore the 
topological band physics originally proposed in electronic systems. 
Here, using a Weyl phononic crystal with precisely designed surface 
terminations, we present an experimental observation of topological 
negative refraction of the surface arc states. Our results advance the 
current knowledge of interfacial acoustics and enrich the Weyl physics 
in condensed-matter systems, providing a foundation for the exotic 
phenomena that arise from band topology. 

In general, the response of an incident wave to a flat interface 
between two different sound media can be predicted using 


equifrequency contour (EFC) analysis, which is applicable to the inter- 
facial systems that are formed both by naturally occurring homogenous 
media and by artificially constructed periodic structures*’. As an exten- 
sion of Snell’s law for homogenous media, EFC analysis can be per- 
formed assuming three fundamental criteria: first, that the incoming 
and outgoing beams have equal frequency for the linear media con- 
sidered here; second, that momentum is conserved parallel to the 
interface; and third, that the reflected and refracted beams must both 
leave the interface, yielding a constraint on their group velocities. In 
Fig. 1 we illustrate various sound responses at one- or two-dimensional 
interfaces that separate different sound media. In contrast to the 
normal, positive refraction (Fig. 1a) that is observed between two 
homogenous media (illustrated by EFCs expanding with frequency), 
anomalous negative refraction (Fig. 1b) emerges when the outgoing 
medium is replaced by a phononic crystal with a negative refraction 
index (illustrated by EFCs contracting with frequency). Note that 
all of these EFCs are closed orbits and inevitably produce unwanted 
interfacial reflection. Positive and negative refractions are both feasible 
for interfacial systems constructed using three-dimensional Wey] pho- 
nonic crystals (Fig. 1c, d). By contrast, interfacial reflection can be 
made forbidden by taking advantage of the open EFCs of topological 
surface acoustic waves (SAWs), at which no reflected surface mode is 
allowed inherently. (The EFCs may also have a contribution from the 
bulk band projection of the Weyl phononic crystal, but for simplicity 
here we focus on the cases for which the surface arcs carry parallel 
momenta that are different from those of the bulk states; otherwise, 
possible energy leakage into the bulk must be taken into account.) As 
we show below, such interfacial phenomena can be realized in a single 
Weyl phononic crystal with precisely designed surface terminations, 
the interfaces of which are the one-dimensional edges shared by two 
adjacent facets. 

We consider a ‘woodpile’ phononic crystal, which consists of stacked 
trilayer building blocks with broken inversion symmetry (Fig. 2a-c). 
Each trilayer unit consists of three identical square epoxy rods that are 
twisted anticlockwise by 21/3 along the z direction layer by layer, form- 
ing a triangular lattice in the x-y plane. The side length of each square 
rod is 9.8 mm and the in-plane and out-of-plane lattice constants are 
29.4 mm. The experimental sample has a cuboid geometry with a size 
of 70.6 cm x 68.7 cm x 64.7 cm; that is, 22 trilayer units along the 
z direction and 24 x 27 structural units in the x-y plane. Termination 
geometries for the four side surfaces are prepared using a laser cut- 
ting technique. The side surfaces denoted YZ, and YZ, have the same 
appearance, whereas those denoted XZ; and XZ, are markedly different 
(Fig. 2b, d-f). Such a design gives the SAW EFCs the desired proper- 
ties for realizing topological negative refraction (Fig. 1d) at one edge, 
together with positive refraction (Fig. 1c) at another edge of the same 
sample. As illustrated in Fig. 2d-f, we use three connected side surfaces 
(XZ, YZ, and XZ) to demonstrate the interfacial phenomena. To best 
characterize the one-way chiral SAWs and their interfacial responses, 
the three side surfaces are cast onto the same plane and a unified x axis 
pointing to the right is used for the surfaces XZ; and XZ. Acoustically 
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Fig. 1 | Schematics of different responses of sound waves to interfaces. 
The upper and lower panels are EFC analyses and associated beam 
propagations, respectively. In the upper panels, solid curves are EFCs at 
the operating frequency and dashed curves are EFCs at a slightly larger 
frequency. The arrows labelled 1, 2 and 3 point towards EFCs of increasing 
frequencies. For simplicity, circular EFCs (centred at O or O’), closed or 
open, are illustrated. In the lower panels, the arrows labelled 1, 2, and 3 
indicate the incident, reflected and refracted beams, respectively, with 
group velocities defined by vy = Vw, where k and w are the wavevector 
and angular frequency, respectively. a, Conventional positive refraction at 


hard epoxy plates, as trivial insulators for sound, are closely attached 
on the sample to host the topological surface arc states. 

In Fig. 2g we show the bulk band dispersions for the Weyl phononic 
crystal along directions of high symmetry. This structurally simple 
phononic crystal has not only single Weyl points but also double Weyl 
points, which link together the lowest three bands (coloured lines). 
The single Weyl points, around which the dispersions are linear along 
all directions, emerge at K (or K’) and H (or H’) points, which are 
not invariant under time reversal. By contrast, the double Wey] points, 
around which the dispersions become quadratic in the k,—k, plane, are 
located at I and A points, as dictated by the time-reversal symmetry. 
The stability of these Weyl points at the high-symmetry momenta is a 
direct consequence of the three-fold screw symmetry of the Weyl phon- 
onic crystal. The topological charges of these Weyl points, as sources or 
sinks of Berry flux, can be calculated either by integrating Berry curva- 
tures or by analysing the rotational eigenvalues at their high-symmetry 
momenta”, These topological charges and their distribution (Fig. 2g, 
h) lead to topologically non-trivial SAWs on the truncated surfaces 
parallel to the z direction (Extended Data Figs. 1, 2). This is demon- 
strated in Fig. 2i-k by the SAW dispersions (green lines) simulated at 
k,=0.51/h (where h is the lattice constant along the z direction) for 
the XZ), YZ; and XZ, surfaces, which traverse a wide gap between the 
lowest two bulk bands. Because all of these dispersions have positive 
slopes, the k,-fixed SAWs propagate anticlockwise around the sample 
(viewed from the top). This one-way chiral behaviour of the SAWs can 
be understood by the fact that the spiral interlayer coupling in the pho- 
nonic crystal introduces a synthetic gauge flux that threads through the 
sample along the z direction. The surface conditions greatly influence 
the properties of the SAW dispersions that are hosted by the XZ, and 
XZ; surfaces (Fig. 2i, k). This is further demonstrated in Fig. 21, n by 
the open EFCs (green lines) simulated at 5.75 kHz, the frequency of 
the Wey] points with topological charge +1. Specifically, for k, >0 the 
SAWs at the XZ, surface always carry a downward component of the 
group velocity, in contrast to the upward component at the XZ; surface. 
(For the SAWs with k, <0, the group velocities reverse automatically 
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an interface between two different naturally occurring homogenous media 
(HM). b, Negative refraction when the outgoing medium in a is replaced 
by a phononic crystal designed with a negative refractive index (NRPC). 

c, Topological positive refraction of SAWs at an interface formed by two 
neighbouring facets of a Weyl phononic crystal (WPC). Two open arcs 
related by time-reversal symmetry are illustrated for each facet. d, The 
same as ¢, but for topological negative refraction. Interfacial reflection 

is forbidden in c and d (indicated by a dashed arrow) owing to the open 
EFCs, in contrast to the unavoidable reflection in a and b with closed 
EFCs. 


owing to time-reversal symmetry.) Such contrasting features contribute 
substantially to the distinct SAW responses to the two one-dimensional 
edges of the YZ, surface (shared with the XZ, or XZ, surfaces). In 
addition, these surface EFCs exhibit gentle curvature for a wide range 
of momenta and the majority of SAWs propagate towards nearly the 
same direction. This facilitates the experimental observation of positive 
and negative refraction of topologically protected SAWs. 

The presence of the topologically non-trivial SAWs is validated by 
our experiments with airborne sound. A broadband point-like sound 
generator is positioned under the cover plate to excite the SAWs and 
a subwavelength-sized sound probe is inserted inside the sample to 
scan the surface-pressure signal point by point. In particular, in each 
sample surface, the sound source is placed on a specific surface corner 
(Fig. 2d—f) to stimulate the one-way chiral SAWs. In this way, all SAWs 
that propagate rightwards with k, > 0 are selectively excited. This not 
only reduces the finite-size effect but also silences the information from 
the time-reversal counterparts (that is, the SAWs with k, <0). Using a 
Fourier transformation of the near-field pressure distributions, we map 
out the SAW dispersions in the surface Brillouin zones. In Fig. 2i-k we 
present the experimental SAW dispersions (colour scale) in the k,-k, 
plane for k,=0.51/h; Fig. 2l-n displays the measured surface EFCs at 
the Weyl frequency of 5.75 kHz. Excellent agreement is found between 
the experiments (colour scale) and simulations (green lines) for all 
three surfaces, despite band broadening due to the finite-size effect. 
The invisible excitation of the surface arc state with k, <0 also reflects 
negligible scattering from the well-excited surface arc state with k, > 0. 

We now turn to the interfacial responses of the topologically pro- 
tected SAWs. We carried out two comparative experiments, one for 
the interface between the XZ, and YZ, surfaces, and the other for the 
interface between the YZ; and XZ) surfaces. In Fig. 3 we show the 
measured near-field pressure distributions at 5.75 kHz. In each case, 
the point-like sound source is positioned at the bottom of the initial 
surface (XZ, or YZ;) to excite the SAWs with k, > 0, which propagate 
towards the upper right of the surface. As anticipated from the EFC 
analysis (inset in Fig. 3a), Fig. 3a displays a typical positive refraction 
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Fig. 2 | The Weyl phononic crystal and topologically protected SAWs. 
a, An image of the experimental sample. b, Schematic top view of the 
trilayer-based sample. XZ), YZ), XZ2 and YZ, label the four side surfaces. 
c, Geometry of the unit cell, with a= h=3b=29.4 mm. d-f, Front views of 
the three surfaces XZ), YZ, and XZ, respectively. At each surface, the red 
star denotes the position of a point-like sound source for experimentally 
generating one-way chiral SAWs and the coloured segments in the insets 
indicate the fine structures of the surface termination. g, Bulk band 
dispersions simulated along high-symmetry directions. The coloured 
lines represent the lowest three bands. h, The first bulk Brillouin zone 

of the Weyl phononic crystal and associated projected surface Brillouin 


effect when the sound signal traverses across the interface to reach the 
second surface YZ,, where the refracted beam travels upwards and 
deflects on the opposite side of the interface normal (white dashed line) 
with respect to the incoming beam. The finite widths of the incoming 
and outgoing beams in real space stem mostly from the finite lengths of 
the open EFCs with k, > 0. By contrast, Fig. 3b shows negative refrac- 
tion, with the outgoing beam propagating downwards and deflecting 
into the same side of the interface normal. This is consistent with the 
fact that the z component of the group velocity is reversed for the 
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zones. The coloured spheres in g and h label Wey] points with different 
topological charges. i-k, Simulated SAW dispersions (green lines) at 
k,=0.5n/h for the three side surfaces XZ), YZ, and XZ», respectively, 
agree very well with our measurements (bright colours in the colour scale, 
which represents the Fourier transformation of the measured pressure 
field). l-n, The corresponding EFCs in the extended surface Brillouin 
zones, simulated and measured at the Weyl frequency of 5.75 kHz. The 
grey regions display the projected bulk bands, the blue spheres label 

the projected Weyl points K and K’, and the green arrows indicate the 
directions of the SAW group velocities. 


SAWS travelling from the YZ, to the XZ» surface (inset in Fig. 3b). 
For both cases, the measured incident and refracted angles agree well 
with predictions from the simulated and measured EFCs (Fig. 21-n); 
these angles give estimated refraction indices of 0.6 and —2.2 for the 
cases of positive and negative refraction, respectively. In particular, for 
each interfacial system, no clear reflection signal is observed near the 
interface, unlike the conventional wave response to an interface. The 
reflection immunity to the time-reversal-related surface arc states is 
further confirmed by the Fourier transforms of the pressure fields on 
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Fig. 3 | Experimental observation of topological negative refraction. 
a, b, Positive (a) and negative (b) responses of SAWs to the two one- 
dimensional edges of the YZ, surface (shared with the XZ, and XZ, 
surfaces, respectively). The near-field pressure distributions (colour 
scale) are measured at 5.75 kHz. The white stars label the positions of the 
point-like sound sources and the white arrows indicate the directions 

of the propagating beams. Insets, schematics of the echo-free interfacial 
phenomena, illustrated according to the EFC analysis in Fig. 1. The 
solid and dashed curves depict the simulated EFCs at 5.75 kHz and 

5.85 kHz, respectively. In both cases, the experimentally measured beam 
propagations agree well with the EFC analysis and exhibit negligible 
interfacial reflections. 


the incident surfaces. Physically, such interfacial phenomena benefit 
from the open nature of the surface arcs of a Weyl crystal. The drop 
in pressure amplitude at the interface is probably caused by the wave 
scattering into the bulk states and the mismatch of SAW properties 
between different surfaces. The signal reduction as the beam propagates 
stems mostly from unavoidable material absorption, broadening of the 
finite-width beam and scattering into bulk states. In addition to fab- 
ricating a cleaner sample, improvements can be achieved by injecting 
a sound source with a narrower range of k, that is not carried by any 
projected bulk state. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
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METHODS 


Numerical simulations. All full-wave simulations are performed using a com- 
mercial solver package (COMSOL Multiphysics). The epoxy material used in our 
experiments is safely modelled as an acoustically rigid material, given the great 
impedance mismatch with respect to the air background. The speed of sound in 
air of 346 ms! is used for our experiment at room temperature (23 °C). The bulk 
band dispersions (Fig. 2g) are calculated for a single unit cell (Fig. 2c) with Bloch 
boundary conditions in all three-dimensional directions. To calculate the k,-fixed 
surface band dispersions (Fig. 2i-k), we consider infinitely large slab structures 
with terminations and thicknesses specified in the text. In each case, the slab is 
thick enough to avoid the coupling between the SAWs hosted by the two different 
surfaces. Rigid boundary conditions are applied to both slab surfaces, whereas 
Bloch boundary conditions are used for the remaining two directions, with fixed 
k, and varying k, or k,. In addition to the projected bulk bands, each numerical 
configuration gives the SAWs for both surfaces simultaneously, which can be fur- 
ther distinguished by inspecting the surface localization of the field distributions. 
Similarly, the EFCs (Fig. 2]-n) are extracted by scanning the surface Brillouin 
zone at 5.75 kHz. In addition to the numerical data provided in Fig. 2, simulations 
of the bulk and surface dispersions are provided in Extended Data Figs. 1 and 3. 

Experimental measurements. To excite the SAWs experimentally, a broadband 
sound signal is launched from a narrow tube (radius of about 4.0 mm) that pen- 
etrates the epoxy plate covering the Weyl phononic crystal. The sound source 
behaves like a point source for the wavelength of about 60 mm used here. The 
surface field is scanned point by point through a microphone (radius of about 3.5 
mm, B&K Type 4187) that is inserted inside the sample. The field confinement 
of the topologically protected SAWs is checked by detecting the pressure distri- 
butions away from the surface (Extended Data Fig. 4). The scanning steps are 5.0 
mm, 25.4 mm and 29.4 mm along the x, y and z directions, respectively. Both the 
amplitude and phase of the local pressure field can be recorded and frequency- 
resolved by using a multi-analyser system (B&K Type 3560B). To map out the EFCs 
(Fig. 2l-n) in the surface Brillouin zone, a two-dimensional Fourier transformation 
is performed for the measured spatial pressure distributions at a given frequency. 
This also gives the frequency-dependent dispersion curves for fixed k, (Fig. 2i-k). 
Discussions on our sample design. Here we provide a brief discussion on our 
woodpile Weyl phononic crystal, which is structurally simple, easily fabricated and 
convenient for field scanning in our sound experiments in air. This Weyl phononic 
crystal system, which relies on the three-fold screw symmetry and is beyond the 
description of tight-binding model’, exhibits a clean bulk band structure and 
a broad topologically non-trivial frequency window (Extended Data Fig. 1). As 
a consequence of the crystalline symmetry, the single and double Weyl points 


LETTER 


are both located at high-symmetry points that are well separated in momentum 
space. The three-fold screw symmetry dictates that the single Weyl points stay at 
the non-time-reversal-invariant high-symmetry points, such as K (or K’) and H 
(or H’), and that the double Wey! points stay at the time-reversal-invariant high- 
symmetry points, such as [ and A. Such symmetry requirements also provide extra 
protection of the Weyl points against pairwise annihilation. These merits (such as 
a wide energy window, fixed momenta of Weyl points and long distances between 
them) greatly facilitate numerical and experimental searches for Weyl points, and 
the confirmation of surface arc states and associated interfacial phenomena. 

In addition, the single or double Wey] points of opposite charge sit at different 
frequencies (see Fig. 2g), owing to the chiral nature of the phononic crystal struc- 
ture. This is very different from the Weyl semimetal TaAs!°", in which the Weyl 
points of opposite sign emerge at the same energy because of the mirror symmetry. 
In solid systems, chiral Weyl semimetals that lack mirror symmetries are very 
attractive because they are predicted to host a quantized circular photogalvanic 
effect®*. Efforts have been devoted to exploring Weyl semimetals with chiral struc- 
tures**->”, These include the so-called Kramers Weyl semimetals***°, in which the 
Weyl points emerge stably at the time-reversal-invariant points. (This mechanism 
requires strong spin-orbit coupling and therefore differs from our acoustic system, 
which is essentially spinless.) In trigonal tellurium and selenium*4, the Kramers- 
theorem-enforced Weyl] semimetal phase takes place only under additional pres- 
sure. Recently, more material candidates for Kramers Weyl semimetals have been 
revealed in a theoretical study*’. Other structurally chiral solid systems have also 
been proposed, such as CoSi* and SrSi,*”, but have complex band structures” or 
are difficult to grow as single crystals*”. The design route of our Weyl phononic 
crystal could potentially serve as a reference for further searches for chiral Weyl 
crystals in solid-state systems. 

Data availability. The data that support the findings of this study are available 
from the corresponding authors on reasonable request. 
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Extended Data Fig. 1 | Simulated surface band dispersions in a wide specified in the main text. Two gapless surface bands (green lines) traverse 
frequency range. a-c, The data are evaluated at k,=0.51/h for the three the two gaps between the lowest three projected bulk bands (grey regions). 


side surfaces XZ), YZ; and XZ», respectively, the terminations of which are We focus on the lowest surface band in the main text. 
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Extended Data Fig. 2 | Berry flux distributions, Chern numbers and 
bulk boundary correspondence. a, Berry flux contributions to the lowest 
two bulk bands, derived from the numerically calculated Weyl charge 
distributions (see Fig. 2h). Here |C| labels the amplitude of the Weyl 
charge, and the red and black spheres indicate sources and sinks of Berry 
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fluxes, respectively. b, Chern numbers for the lowest two bulk gaps opened 
at k, > 0, labelled according to the Berry flux distributions in a. (Here, 
the band structure is simulated at k,=0.51/h.) The Chern numbers C,,, 
of both gaps are —1, consistent with the topologically non-trivial surface 
spectra presented in Extended Data Fig. 1. 
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Extended Data Fig. 3 | Different EFC properties attained by 
engineering the surface terminations of the Weyl phononic crystal. 

a, Surface EFCs evaluated for the XZ surfaces at the Weyl frequency of 5.75 
kHz, where the parameter d, defined at the top of the column characterizes 
the surface truncation (black dashed line). The grey regions display the 
projections of bulk bands, the blue spheres display the projections of the 
Weyl points K and K’ at 5.75 kHz, and the arrows indicate the directions 
of the group velocities for the surface arc states. b, Similar to a, but for the 


k, (2n/V3a) 


YZ surfaces specified by d,. The evolution of the EFCs for different surface 
terminations shows various possibilities of manipulating the surface states 
according to their group velocities. We focus on the cases in a2, a4 and b4 
in the main text to attain the desired SAW properties. Throughout, we use 
5.75 kHz to shrink the momentum regions projected by the bulk bands, 
which is favourable for the experimental observation of the surface arc 
states and the associated interfacial phenomena. 
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Extended Data Fig. 4 | Experimental characterizations of the 
topologically protected SAWs. a-c, Pressure distributions on the sample 
surfaces XZ), YZ; and XZ, (Fig. 2d-f), respectively, scanned step by step 
at 5.75 kHz. The white stars indicate the positions of the point-like sound 
source. The propagation directions of the beams, as predicted from most 
of the SAWs hosted on the corresponding facets, are closely related to the 
positive and negative refractions observed in Fig. 3. The data are used to 
obtain the EFCs (with k, > 0) in Fig. 2l-n through a Fourier transform. 
Similar data can be collected to obtain the frequency-dependent surface 


band dispersions at a given k, (Fig. 2i-k). d-f, Decay signatures identified 
for the surface states in a-c. The data are measured along the normal 
directions of the sample surfaces, the in-plane coordinates of which are 
marked by the black circles (A—G) in a-c. The pressure magnitudes do not 
exhibit precise (oscillatory) exponential decay, mainly because the surface 
beam consists of many surface arc states with different out-of-plane decay 
lengths. The inset shows the k, dependence of the decay length / simulated 
for each surface. 
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Photoswitching topology in polymer networks with 
metal-organic cages as crosslinks 


Yuwei Gu, Eric A. Alt!, Heng Wang’, Xiaopeng Li?, Adam P. Willard! & Jeremiah A. Johnson!* 


Polymer networks can have a range of desirable properties such as 
mechanical strength, wide compositional diversity between different 
materials, permanent porosity, convenient processability and broad 
solvent compatibility!”. Designing polymer networks from the 
bottom up with new structural motifs and chemical compositions 
can be used to impart dynamic features such as malleability or 
self-healing, or to allow the material to respond to environmental 
stimuli**. However, many existing systems exhibit only one 
operational state that is defined by the material’s composition and 
topology*~; or their responsiveness may be irreversible”®!° and 
limited to a single network property’? (such as stiffness). Here 
we use cooperative self-assembly as a design principle to prepare 
a material that can be switched between two topological states. By 
using networks of polymer-linked metal-organic cages in which 
the cages change shape and size on irradiation, we can reversibly 
switch the network topology with ultraviolet or green light. This 
photoswitching produces coherent changes in several network 
properties at once, including branch functionality, junction 
fluctuations, defect tolerance, shear modulus, stress-relaxation 
behaviour and self-healing. Topology-switching materials could 
prove useful in fields such as soft robotics and photo-actuators as 
well as providing model systems for fundamental polymer physics 
studies. 

The topological connections between polymer network strands 
and junctions—including branch functionality, cyclic defects, entan- 
glements and density fluctuations—have long been implicated in 
dictating the bulk properties of polymer networks’. With the advent 
of new synthetic methods and tools for polymer network characteri- 
zation'*"'®, recent studies have revealed that network topology, as one 
of the few global parameters for polymer networks, plays a critical role 
in determining material properties such as elasticity’”, the gel point’®, 
network dynamics! and degree of defect tolerance’. Here we propose 
and examine topology switching as a strategy to develop dynamic and 
stimuli-responsive materials in which multiple aspects of the materials’ 
properties are regulated coherently. 

We hypothesized that polymer metal-organic cage (polyMOC) net- 
works*!®, which are a class of materials formed from the metallosupra- 
molecular assembly of discrete MOCs connected to polymer chains, 
could provide a route to topology-switchable materials (Fig. 1a). In 
polyMOCs, the network junctions are nanoscale metal,ligand, (M,Ly) 
cages of defined shape and stoichiometry”; the MOC structure 
defines the polyMOC topology and bulk properties. Inspired by 
recent developments in modulating MOC conformation in a stimuli- 
responsive fashion”*”, we began our investigation with the design and 
synthesis of a poly(ethylene glycol)-based polymer ligand (Fig. 1b), 
which features two bis-pyridyl dithienylethene (DTE) groups. We 
hypothesized that, in the presence of Pd**, these ligands would form 
MOCs that could be reversibly photoswitched between small Pd3L¢ 
rings and large Pd24L4g rhombicuboctahedra by irradiating with 
green and ultraviolet (UV) light, respectively”’. Before embarking on 
polyMOC synthesis using polymer ligands, we conducted model studies 
with a non-polymeric bis-pyridyl DTE-derivative ligand to confirm the 


photoswitching capability of these DTE derivatives (Extended Data 
Fig. 1). 

When 1 equivalent of the open-form polymer ligand (o-PL) was 
mixed with 1 equivalent of Pd(CH3CN)4(BF4)2 in CH3CN (25.1 mM 
Pd?*), a dark-brown gel formed (named o-gel). After annealing at 
60°C for 4h to enable equilibration of the network junctions, the o-gel 
was characterized with oscillatory rheometry (Fig. 2a, Extended Data 
Fig. 2a). The material displayed elastic behaviour at all frequencies with 
a storage modulus (G’) of about 8.3 kPa at 10 rad s~!. Irradiation of 
the o-gel with UV light for 5 h at 60°C produced a dark blue material, 
which we call c-gel (for closed-form), with a nearly doubled G’ 
(Fig. 2a). The shear loss modulus (G”) of the c-gel (Extended Data 
Fig. 2b) was slightly higher than that of the o-gel at the measured 
frequencies (1-100 rad s~'). We attribute such behaviour to the pres- 
ence of a greater number of topological defects (for example cyclic 
defects) in the c-gel—a characteristic feature for polyMOC gels with 
high branch functionality, f (ref. °). A sample of the c-gel prepared 
directly from pre-synthesized closed-form polymer ligand (c-PL) 
and Pd** displayed nearly identical frequency behaviour (Extended 
Data Fig. 2d), which suggests that the observed changes in mechanical 
properties on conversion of the o-gel to c-gel are due to photoswitching 
of the DTE-containing MOC junctions. Finally, exposure of the c-gel 
to green light for 5 h at 60°C regenerated the o-gel with only a slight 
decrease in stiffness compared with the original o-gel (G’ =7.8 kPa; 
Fig. 2a, Extended Data Fig. 2c). Notably, the o-gel and c-gel have the 
same molar concentration of polymer strands and Pd atoms. Although 
the magnitude of the change in G’ demonstrated here is relatively small 
(a factor of 1.7), in conventional supramolecular materials based on 
point interactions, the o-gel and c-gel would be expected to have the 
same G’; the hierarchical nature of polyMOCs enables access to two 
distinct topological states with different densities of elastically effective 
strands, cyclic defects and junction fluctuations, which leads to 
different G’. 

Small-angle X-ray scattering (SAXS) was used to confirm that the 
mechanical changes observed above were due to changes in the MOC 
structure within the o-gel and c-gel (Fig. 2b, c). The scattering curve 
for the o-gel shows a broad peak at scattering vector q=0.080 A~! and 
a weak peak at q=0.729 A~! (Fig. 2b, inset). The latter peak agrees 
with the form factor of a uniform spherical particle with radius of 
0.79 nm. We attribute this peak to scattering from Pd3Ls MOCs within 
the o-gel. The low-q SAXS peak, which corresponds to a d-spacing of 
about 7.85 nm, was assigned as the average distance between MOCs 
linked by polymer chains. 

The SAXS curve for the c-gel displayed a sharpened peak shifted 
to lower q than for the o-gel, as well as five new peaks in the high-q 
regime (Fig. 2b, Extended Data Fig. 3a). The latter peaks were best 
fitted by the form factor of a uniform spherical particle with a 2.9-nm 
radius (Extended Data Fig. 3b), which agrees remarkably well with the 
analogous measurement by diffusion ordered NMR spectroscopy 
(DOSY) (3.1 nm; Extended Data Fig. 1c). The low-q SAXS peak for 
the c-gel corresponds to a d-spacing of 11.9 nm; this increased distance 
between MOCs compared with the o-gel reflects the fact that the larger 
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M, = 4.6 kDa 


Fig. 1 | Design of polyMOCs with photoswitchable topology. 

a, Schematic illustration of photo-regulated interconversion between two 
different network topologies. Photoresponsive MOCs are introduced as 
junctions within polyMOCs. On UV irradiation, the MOC rearranges 

its structure from Pd3L¢ to Pd24L4g. Consequently, the network topology 
(for example, average branch functionality and cyclic defects) is altered. 
Reversal of the MOC structure with green light regenerates the original 
network topology. b, Chemical structure of the photoresponsive polymer 
ligand and a schematic of MOC interconversion. The DTE moiety 


MOCs in the c-gel must be further apart to maintain a constant Pd con- 
centration. The sharpening and increased intensity of the low-q SAXS 
peak for the c-gel is another sign that this material possesses uniform 
Pd-rich regions (Pdz4L4g MOCs). 

We carried out simulations of Pd3L¢ (o-gel) and Pd24L4g (c-gel) 
polyMOC gels (25.1 mM Pd?*) using a coarse-grained model of 
dynamic network topology to probe the topology of our materials in 
more detail (Fig. 2d, e, Extended Data Figs. 5, 6). Specifically, the model 
was used to compute the density of loop defects and the average net- 
work branch functionality (Ff). 

Whereas 40.3% of the strands in the simulated o-gel were primary 
loops (that is, both ends of the strand were connected to the same Pd3L, 
MOG; red strands in Fig. 2d), a remarkable 74.2% of the strands in the 
simulated c-gel were primary loops (red strands in Fig. 2e). Despite 
this increase in elastically inactive primary loops, as noted above, G’ for 
the c-gel was 1.7 times that for the o-gel. In other words, on switching 
topology, the degree to which these materials could tolerate defects is 
changed (the c-gel can withstand more defects than the o-gel). Aiming 
to increase AG’ further, we prepared the o-gel’ by replacing 15% of 
o-PL in the o-gel with free ligand o-L (these free ligands represent 
elastically inactive ‘defects’), as shown in Fig. 2f. Because of its low 
degree of defect tolerance, the o-gel’ has much lower G’ than the 
unmodified o-gel (about 1.2 kPa versus about 8.3 kPa). On irradiating 
with UV light, topology switching (see Extended Data Fig. 4d for SAXS 
data) results in a highly defect-tolerant state (c-gel’), increasing G’ by 
an order of magnitude (from about 1.2 kPa to about 9.8 kPa; Fig. 2f, 
Extended Data Fig. 4a, b). The switching process was confirmed to 
be reversible (Fig. 2f, Extended Data Fig. 4c). Given such a large and 
reversible AG’, we anticipate that topology-switching materials could 
find applications in fields such as soft robotics and photo-actuators. 
Moreover, the incorporation of free ligand offers a general yet simple 
strategy to introduce functional moieties into polyMOC junctions. 
As branch functionality and junction size are altered considerably on 
switching topology, the possible number of functional moieties that 
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undergoes electrocyclic ring-closure and ring-opening on UV and green- 
light irradiation, respectively. Note that the ring-closure reaction produces 
a racemic mixture of trans-ring-closed products. Photoinduced ring- 
closure/opening leads to a change of bite angle between the two attached 
pyridine groups. Hence, the open-form (0-PL) and closed-form (c-PL) 
polymer ligands form small Pd3(0-PL)¢ and large Pd24(c-PL)4g MOCs, 
respectively, in the presence of Pd**. The existence of many diastereomeric 
cage structures does not seem to inhibit the assembly process. 


can be confined in a MOC is increased or decreased”; this could prove 
useful for reversible encapsulation and release of small molecules, as 
well as for responsive materials based on multivalent interactions”. 

The simulated f were compared with experimental results. To calcu- 
late experimental f from measured G’ values, two network models— 
an affine network and a phantom network—were applied, which assume 
different degrees of network junction fluctuation. As the network topo- 
logy switched from the o-gel to the c-gel, the local mass density around 
each network junction was greatly increased. Consequently, our results 
(Extended Data Fig. 6) indicate that the ability of the network junctions 
to fluctuate around average positions was diminshed, which corresponds 
to a reversible phantom-to-affine transition in polymer networks. 
Because the difference between a phantom and an affine network forms 
the basis of a thorough understanding of rubber elasticity’, we anticipate 
that the reversible phantom-to-affine transition achieved through topo- 
logy switching may be an ideal model system in which to examine these 
two network models experimentally. 

Owing to cooperative effects, the metal-ligand bonds in large MOCs 
are often much less dynamic than the same bonds in smaller com- 
plexes”®. Thus, we expect the o-gel to be much more dynamic than the 
c-gel, which provides the ability to switch network dynamics through 
topology switching. Measurements of swelling in excess CH3CN solvent 
(Fig. 3a) support our hypothesis: the o-gel absorbed 102 + 13 times 
its own weight in CH3CN after 5 days and completely dissolved after 
20 days, whereas the c-gel absorbed 14+ 6 times its own weight in 
CH3CN after 5 days and did not swell further over 20 days. 

Stress-relaxation measurements were conducted to quantitatively 
evaluate the dynamics of the o-gel and c-gel as a function of temper- 
ature (Fig. 3b, c, respectively); characteristic relaxation times 7 were 
obtained by fitting experimental data to the Kohlrausch model. For 
both the o-gel and c-gel, 7 decreased as temperature increased, owing 
to thermally accelerated ligand exchange. More importantly, at the 
same temperature, 7 for the c-gel was consistently about an order of 
magnitude larger than for the o-gel. In addition, 7 for the o-gel at 
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Fig. 2 | Photoswitching of polyMOC topology. a, A brown polyMOC 
(o-gel) was formed by mixing o-PL and Pd(CH3CN)4(BF4)2 in CH3CN 
(25.1 mM). The shear storage modulus (G’) of the o-gel was 8.3 kPa 

(black curve). UV irradiation of the o-gel at 60°C for 5 h produced a 

blue material (c-gel) with G’ = 14.1 kPa (red curve). Irradiation with 
green light regenerated the o-gel with nearly the same G’ (blue curve). 
The slight decrease in G’ on regeneration of the o-gel is believed to be 

due to a small amount of ligand photodegradation. b, SAXS curves for 
polyMOCs before (black curve) and after (red curve) UV irradiation. 

The d-spacing in the low-q regime is assigned as the average distance 
between MOC junctions. The high-q peaks correspond to form factors of 
Pd3(0-PL)¢ (inset) and Pdy4(c-PL)4g. c, SAXS curves over the course of UV 


room temperature (7=555 s) was less than that for the c-gel at 60°C 
(T=1,085 s). Thanks to topology switching, the static properties (G’) 
and dynamic properties (7) of this material could be simultaneously 
switched, both by one order of magnitude. This design principle opens 
many potential applications, such as soft adhesives, or biomaterials for 
adaptive cell culture that can reversibly switch stiffness and stress relax- 
ation rate, both of which have been shown to be essential to regulate 
stem-cell fate and cell proliferation?””®. 

These differences in relaxation kinetics between the o-gel and c-gel 
should translate to unique topology-dependent self-healing behav- 
iours; the slow exchange kinetics of the c-gel should make it less able 
to heal. To test this hypothesis, samples of the o-gel and c-gel were 
cut into two pieces and placed in contact; considering the difference 
in measured relaxation timescales between the o-gel and c-gel, the 
self-healing condition was chosen as 4 h at 40°C. After 4h, the o-gel 
was completely healed with no visible damage at the interface, whereas 
the c-gel remained in two separate pieces. In addition, our polyMOC 
could be switched from the healable state (o-gel) to the non-healable 
state (c-gel) and back (Fig. 3d). 

Most existing self-healing materials rely on fast bond exchange in the 
healable state. Although such fast bond exchange provides self-healing 
properties, the same mechanism leads to low mechanical robustness 
and considerable material deformation on loading, which makes 
self-healing materials unsuitable for many applications. Topology 
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irradiation. Inset, zoom-in of the high-q regime of the SAXS curves. The 
form factor corresponding to Pd4(c-PL)4g emerges over time, providing 
direct evidence for the transition of network topology. d, Snapshot of the 
simulated o-gel. Primary loop strands are shown in red. e, Snapshot of 
the simulated c-gel. Primary loop strands are shown in red. f, Topology 
switching produces a marked switch in the degree of defect tolerance. By 
replacing 15% o-PL with o-L, we were able to switch G’ by one order of 
magnitude, from about 1 kPa to about 10 kPa, on switching the topology. 
Each experimental curve in a, b, c and f shows a representative result of at 
least three measurements, which showed excellent reproducibility with a 
relative standard deviation of less than 5%. 


switching is a general way to break this limitation. It allows a single 
material to be operated in one topological state (c-gel) that is mechan- 
ically robust and suitable for heavier-duty applications; ifthe material is 
damaged or needs to be re-processed, one can simply switch its topol- 
ogy to the self-healable state (o-gel) to heal or process the material, and 
then switch back to the operation state”’. 

To study the fatigue behaviour of our materials, we measured G’ 
over several cycles of switching between the o-gel and c-gel using 
two sets of irradiation conditions. As shown in Fig. 4a (black line), 
when the polyMOC was cycled between exposure to UV light for 5 h 
at 60°C and green light for 5 h at 60°C, the material displayed elastic 
behaviour in both topological states for seven cycles. When cycled at 
lower temperature (45 °C) and longer irradiation time (12 h), the material 
lost its mechanical integrity after three cycles (Fig. 4a, red curve). 
These data are indicative of photoinduced degradation, probably of the 
DTE-containing ligands, because related DTE derivatives are known 
to degrade under extended UV irradiation*®. SAXS was used to probe 
the fatigue mechanism of our polyMOCs in more detail. The scattering 
profiles for the o-gel and c-gel did not noticeably change after three 
switching cycles (Fig. 4b); however, the scattering profiles after seven 
switching cycles (green and yellow curves in Fig. 4b) no longer possess 
the MOC form factors, suggesting that UV-induced DTE degradation 
disrupted uniform MOC formation. The presence of low-q SAXS 
peaks after seven cycles suggests the presence of metal-ligand clusters 
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Fig. 3 | Photoswitching topology leads to tunable network dynamics. 

a, Swelling before (o-gel) and after (c-gel) UV irradiation. b, Stress- 
relaxation curves at various temperatures for the o-gel. Curves were fitted 
with the Kohlrausch model to provide the characteristic relaxation times T. 
Symbols, raw data; solid lines, fitted data. c, Stress-relaxation curves at 


of ill-defined structure; these clusters may still provide mechanical 
robustness to the material. 

To elucidate the fatigue mechanism in more detail, f values were 
calculated based on G’ values measured over several switching cycles 
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various temperatures for the c-gel. Each experimental curve in b and c 
shows a representative result of at least three measurements, which showed 
excellent reproducibility with a relative standard deviation of less than 5%. 
d, Photographs of self-healing experiments, which demonstrate that the 
o-gel undergoes self-healing whereas the c-gel does not. 


(Fig. 4c). After seven cycles, the c-gel still possesses a very high f value 
(6.26 +0.51). In contrast, f for the o-gel after seven cycles (2.22 £0.07) 
is close to the f=2 limit at which the sol-gel transition starts to occur. 
To model the extent of UV-induced DTE degradation, dynamic 
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Fig. 4 | Fatigue properties. a, Shear modulus G’ over multiple topology 
switching cycles. Each switching cycle includes UV irradiation (purple 
regions) followed by green-light irradiation (white regions). Two different 
irradiation schemes were applied for each UV or green-light irradiation. 
Higher temperature enabled reduction of the irradiation time, which gave 
improved fatigue resistance. Error bars represent the standard deviation of 
three measurements. b, SAXS curves for topology switching cycles. Each 
experimental curve shows a representative result of at least three 
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measurements, which showed excellent reproducibility with a relative 
standard deviation of less than 5%. c, Average branch functionality (Ff ) 
calculated on the basis of G’ over multiple switching cycles (black curves), 
and simulation results based on the experimentally calculated f for 

the ligand degradation profile over multiple switching cycles (red curves). 
In the calculation of f , the c-gel and o-gel are treated as affine and 
phantom networks, respectively. Error bars represent the standard 
deviation of three measurements. 
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network topology simulations were used to compute f as a function of 
the percentage of active ligand within these polyMOCs (Extended Data 
Fig. 7); f values obtained from experiment were then translated to a 
ligand degradation profile (Fig. 4c). From these data, we estimate that 
25% of the DTE-based ligand was degraded over seven cycles. This 
result is consistent with the degradation profile of free ligand obtained 
by 'H NMR (Extended Data Fig. 8). 
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Any Methods, including any statements of data availability and Nature Research 
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are available in the online version of the paper at https://doi.org/10.1038/s41586- 
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METHODS 
A complete set of detailed synthetic procedures for polymer ligand (PL) and spectral 


data is available in Supplementary Information. 

Materials and instrumentation. We purchased all deuterated solvents from 
Cambridge Isotope Laboratories; 2-chloro-5-methylthiophene from Alfa Aesar; 
and 4-bromopyridine hydrochloride from Ark Pharm. All other reagents and sol- 
vents were purchased from Sigma-Aldrich. Anhydrous, degassed dichloromethane 
(DCM) and tetrahydrofuran (THF) were used from a J.C. Meyer solvent purifi- 
cation system. High-performance liquid chromatography (HPLC) grade DCM 
and THF were sparged vigorously with argon for at least one hour before being 
connected to the solvent purification system. All reactions were performed using 
standard Schlenk techniques and anhydrous solvents. 

UV irradiation was performed with 8 W SANKYO DENKI Blacklight UV lamps 
with peak emission at 300 nm. Visible light irradiation was performed with four 
3-W Sunlite 80146 green LED light bulbs. 

All chromatography was performed on EMD Millipore silica gel 60, particle size 
0.040-0.063 mm (230-400 mesh), on a Biotage Isolera Prime flash purification 
system. Gel permeation chromatography (GPC) was performed on an Agilent 1260 
LC stack equipped with an Agilent multi-wavelength UV-vis detector, a Wyatt 
TrEX refractive index detector, a Wyatt DAWN EOS 18-angle light scattering detec- 
tor and two Shodex KD-806M GPC columns. The GPC system was set to 60°C at a 
flow rate of 1 ml min~! with 0.025 M LiBr in DMF. UV-vis spectra were obtained 
from a Varian Cary 50 Scan UV-vis spectrophotometer. 

We acquired 'H nuclear magnetic resonance (‘H NMR) and °C nuclear mag- 
netic resonance ('*C NMR) spectra on a 500-MHz Varian INOVA spectrometer 
and processed them in MestReNova 11.0.4. We acquired 'H DOSY spectra on the 
same spectrometer. The gradients were increased from 3.6 Gcm' to 29.2G cm! 
in equally spaced steps using 16 scans. Gradient pulse (6) was set to 2.0 ms and 
diffusion time (A) set to 0.1 s. Diffusion coefficients for resolved 'H signals were 
extracted from decay curves using the ‘peak height fit’ in DOSY Transform module 
of MestReNova 11.0.4. 

High-resolution mass spectrometry was done on a Waters quadrupole time-of- 
flight Premier instrument. Each sample was dissolved by CH3CN and analysed by 
direct flow injection (injection volume = 1 1] or 5 jl) electrospray ionization (ESI) 
in the positive mode. The optimized condition was as follows: ESI capillary voltage 
3,000 V, sample cone voltage 35 V, source temperature 120°C and desolvation 
temperature 350°C. 

Mass spectrometry of the solution self-assemblies between Pd?" and ligand was 
recorded with a Waters Synapt G2 spectrometer, using 0.5-mg complex sample 
in 1 ml of CH3CN/CH3OH (3:1, v/v). The following parameters were used: ESI 
capillary voltage 3.0 kV; sample cone voltage 20 V; extraction cone voltage 0.1 V; 
source temperature 100°C; desolvation temperature 100°C; desolvation gas flow 
700 1 h~1 (Np); sample flow rate 10 julmin?. 

Frequency sweep rheology experiments were performed on a TA Instruments 
Discovery Hybrid Rheometer HR-2. The rheometer was outfitted with an active 
temperature control system with an environmental enclosure for temperature 
control. A parallel-plate geometry (radius 8 mm) was used and coupled with a 
bottom plate, with the typical gap of 1.00 mm between the two plates. Frequency 
sweep experiments were performed from 1 rad s' to 100 rad s at 0.5% strain, 
which was first confirmed to be in the linear viscoelastic regime using strain sweep 
experiments. Shear modulus G’ was determined based on G’ values at 10 rads. 
Stress-relaxation experiments were carried out at 2% strain. The loaded gel samples 
were immersed in mineral oil to reduce solvent evaporation and gel de-swelling 
by adsorbing moisture. The stress relaxation curves were fitted to a stretched expo- 
nential function G = Ge! 7) , from which a characteristic relaxation T could be 
obtained”. In all the fittings performed, R? was greater than 0.99. 

Transmission SAXS was conducted at beamline 12-ID-B at the Advanced 
Photon Source at Argonne National Laboratory. The photon energy was 14 keV 
(that is, wavelength was 0.8857 A), the beam size was 60 x 200 jum?, and the detec- 
tors used were Pilatus 2M (SAXS) and Pilatus 300K (wide-angle X-ray scattering 
WAXS). The sample-to-detector distances were calibrated using silver behenate 
(AgBe); for the SAXS detector, this was 3,612.22 mm. The exposure time was set 
to 0.2 s during data collection. A small amount of polyMOCs was removed with 
a spatula to fill the hole of a circular washer that acted as a sample holder (outer 
diameter 24 mm, inner diameter 2 mm, thickness 1 mm). To minimize the sol- 
vent evaporation on such small-volume samples, the measurement was conducted 
immediately after the sample was loaded onto the washer. 

Swelling measurements were carried out in CH3CN: 0.3 ml of as-formed 
polyMOC (before or after UV irradiation, 25.1 mM PL concentration) in a 4-ml 
vial was swollen in CH3CN at room temperature. After 5 days, the CH3CN that was 
not absorbed by the polyMOC was decanted. The inner wall of the vial was care- 
fully wiped to remove the solvent adhering to the wall. The weight of the gel was 
measured by subtracting the weight of the vial. The poly MOC was further swollen 
in CH3CN for 15 days. The same procedure was used to measure its swollen weight 


at day 20. The swelling ratio was calculated as the ratio between the swollen weight 
and dry weight, with the latter estimated on the basis of the amount of polymer 
ligand and Pd(CH3CN)4(BF4)2 added during the synthesis of the poly MOC. 
Solution self-assembly of model ligand with Pd?*. We used 1,2-bis(2-methyl- 
5-(pyridin-4-yl)thiophen-3-yl)cyclopent-1-ene (o-L) as a model compound to 
study the photo-controlled self-assembly of our non-fluorinated DTE bis-pyridine 
ligand and Pd?*. 

To form Pd3(o-L)s complex, we mixed 2.8 ml o-L solution (4.29 mM in CD3CN) 
with 0.2 ml Pd(CH3CN)4(BF4)2 solution (30.0 mM in CD3CN). On mixing, the 
solution quickly turned to light green. The mixture was stirred at 70°C for one 
hour. 

To form the Pd24(c-L)4g complex, 2.8 ml o-L solution (4.29 mM in CD3CN) 
was first irradiated with UV light for one hour to form a dark purple solution. 
1H NMR confirmed that all the o-L was converted to c-L. We then added 0.2 ml 
Pd(CH3CN)4(BF4), solution (30.0 mM in CD3CN) to the aforementioned c-L 
solution. On mixing, the solution quickly turned to dark blue. The mixture was 
stirred at 70°C for one hour. 

Photoswitching from Pd3(0-L)¢ to Pd24(c-L)4g was performed by irradiating 
Pd3(0-L)¢ solution (2 mM Pd?* concentration in CD3CN) with UV light for 
15 h. Photoswitching from Pd4(c-L)4g to Pd3(o-L)s was performed by irradiating 
Pdo4(c-L)4g solution (2 mM Pd?+ concentration in CD3CN) with green LED light 
for 10 h. It should be noted that the conversion from Pd3(0-L)¢ to Pdz4(c-L)4g is 
slow. The same observation was made for an analogous system”’. These authors 
postulated that only the small amount of free o-L present at equilibrium in a solu- 
tion of Pd3(0-L)¢ is able to undergo photocyclization. Under gradual but slow 
shifting of the dynamic equilibrium, the latter reaction delivers the c-L that sub- 
sequently reacts with Pd?* to Pdy4(c-L)4g. 

From DOSY spectra, we measured that the diffusion value (D) for Pd3(0-L)g is 
(4.65+0.51) x 107! m?s~}, and that for Pdo4(c-L)4g is (1.94 £0.26) x 107! m?s7! 
(Extended Data Fig. 1c). 

To calculate the hydrodynamic radii, we used the Stokes—Einstein equation: 


kT 
=> 
6n7D 


where r is radius, k is Boltzmann constant, T is temperature, 7) is dynamic viscosity 
of CD3CN (3.69 x 10-4 Pas !)?3 and D is diffusion value estimated by the DOSY 
experiment. 

Asa result, the hydrodynamic radius was estimated to be 1.27 nm for Pd3(0-L)¢ 

and 3.05 nm for Pdy4(c-L)4. 
Synthesis of polyMOCs. o-gel. In a 4-ml vial, 49.0 mg o-PL (MW =5,576) was 
dissolved in 210 p1 CH3CN, to which 90 jul Pd(CH3CN)4(BF4), solution (97.6 mM 
in CH3CN) was added. The mixture was vortexed for 20 s before it was annealed at 
60°C for 4h to obtain the o-gel. Such gels were used for SAXS studies, swelling tests 
and self-healing tests. To calculate the concentration of PL (or Pd?*) in the gels, the 
volume of poly(ethylene glycol) is taken into account (the density of poly(ethylene 
glycol) is taken as 1.0 g cm~*), and the concentration of PL is thus 25.1 mM. 

To prepare samples for shear rheology measurement, the as-mixed gels were 

transferred to a Teflon mould (8 mm in diameter, 2 mm in height) and sandwiched 
between two other Teflon sheets assisted by paper clips!’. The assembly was then 
heated in an oven at 60°C for 4h. A clear circular gel pad was obtained and loaded 
on the rheometer. 
c-gel. We prepared a CH3CN solution of c-PL by UV irradiation of o-PL solution 
for one hour and no further purification. The c-gel was prepared analogously to 
the o-gel by replacing o-PL with c-PL. 
o-gel'. This material was prepared in a similar fashion to the o-gel, with 15% o-PL 
replaced by o-L (note that each o-PL has two ligands whereas each o-L only has 
one ligand). The stoichiometry between Pd** and total amount of ligand was con- 
trolled at 1:2. 
Photoswitching between the o-gel and c-gel. To aid the diffusion of polymer- 
bound ligands as well as the conversion from Pd3(0-L)¢ to Pdz4(c-L)4g (which 
strongly relies on the dynamics and equilibrium of ligand exchange), the light 
irradiation was performed at elevated temperature. The o-gel was placed in a 
closed vial and placed on its side on a digitally controlled hot plate. The surface 
temperature of the hot plate was set to 60°C and closely monitored with an infrared 
thermometer. The conversion from the o-gel to c-gel was achieved in 5 h of UV 
irradiation. Similarly, when the c-gel was irradiated with green light for 5 h at 60°C, 
conversion to the o-gel was achieved. We used 5h light irradiation at 60°C in most 
of this work, except for some of the gels prepared for fatigue studies, for which 12 h 
light irradiation at 45°C was used. 

For the samples used in shear rheology measurements, the sandwiched-assembly 
method was again used to prevent the deformation of gel samples. Briefly, the gels 
were transferred to a Teflon mould (8 mm in diameter, 2 mm in height) and sand- 
wiched between two transparent glass slides assisted by paper clips. The assembly 
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was then placed on a hot plate at 60°C and was irradiated for 5 h with UV light 
(or green light). 

SAXS characterization of topology switching. The high-q regime of the SAXS 
profile for the c-gel was assigned as the form factor of Pdz4(c-L)4g, and the peaks 
were thus fitted with the form factor of a solid spherical particle*!, in which R is 
the radius of the sphere: 


[sin(qR) — qRcos(qR)]° 
mS 6 
(qR) 


The fitting results (Extended Data Fig. 3) show that the form factor of a spherical 
particle with radius 2.9 nm agrees well with the five observed peak positions. 
Simulation method. Each polymer chain has two ends, and a system's configura- 
tion can be expressed as a function of the binding site positions, r, and the connec- 
tivity between binding site positions via the polymers. Because each binding site 
is connected to at most one other binding site, the polymer topology connecting 
binding sites can be expressed as an adjacency vector A: 


P(q) 


A=(A,,Aj,...,Ay_y» Ay)” 


where N is the number of network node binding sites available to the polymer 
chain ends. If the two ends of a polymer are bound at sites i and j, then the sites are 
connected, and A; =j and Aj; = i. It is also possible for one end of a polymer chain 
to be bound to a network node binding site at a fixed position while the other end 
of the chain is not bound to a network node (it is not at one of the sites € [1, N] 
with a defined position). In this case we define A; = 0. The entropic potential for 
the polymer chains is the same model for polyethylene glycol described in previous 
work®, and the probability for a polymer chain of length / to have an end-to-end 
distance ris denoted P,(r). 

To sample a network configuration, nodes are generated at random positions 
within a simulation volume subject only to an excluded volume constraint to pre- 
vent nodes from overlapping until a desired system density is reached. Ligand 
binding site positions are arranged around each node position according to the 
geometry of the node. Starting with all of the binding positions being available, 
a= ({l,..., N}, and no connections, A =0, available positions are chosen one at 
a time at random, removed from the list of available positions, and paired with 
another available binding position (if a viable one exists), which is then also 
removed from the list of available binding positions. This process continues until 
all the binding sites at network nodes have been paired up (except in cases when 
no viable pair exists). The probability that the ligand on one end of a polymer 
chain will bind at position 1; given that the ligand on the other end is at position 
r; is proportional to the probability of the free polymer of length / having an end- 
to-end distance rj. The probability of position i pairing with available position j 
(assuming there is at least one viable pair available to i) is: 


Pr. 
MT) se ag 


P(A; =j) = 
/ Vrea Piltn) 


a such that P,(r;,.) > 0 


Simulations used a cubic, fully periodic simulation cell with side lengths of 100 nm, 
containing 16,154 pairs of connected ligands mediated by a polymer potential 
parameterized for polyethylene glycol with 100 repeat units. The network nodes 
for a given simulation comprised 4, 6, 12, 24 or 48 binding positions arranged in 
the Pd,L4, Pd3L¢, PdeLi2, PdizL24 or Pdo4Lag geometry, respectively. Results were 
averaged over 100 independent simulations for each set of conditions. In all the 
simulations performed, the concentration of Pd?* was set to 25.1 mM and the 
stoichiometry between Pd?* and PL was set to 1:1 (that is, [Pd?*]:[L] =1: 2), which 
is consistent with the experimental condition. 
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Phantom-to-affine network transition through topology switching. Along with 
the increase in primary loops described in main text, simulation results indicate 
that the ultra-high functionality of the c-gel leads to an abundance of secondary 
loops (Extended Data Fig. 5b), wherein two MOCs are bridged by two or more 
strands. Unlike primary loops, which are elastically inactive, secondary loops do 
contribute to elasticity; however, the extent of their contribution depends on which 
theoretical model network is considered (for example phantom or affine net- 
work)!”3?. To address which theory was more appropriate to describe each topo- 
logical state of our material, we calculated the average total number of connections 
per MOC Ff. an affine model-based approximation for f ; Extended Data Fig. 5c) 
and the average number of active connections (where secondary loops are treated 
as one effective connection; Extended Data Fig. 5d) per MOC Fo a phantom mod- 
el-based approximation), as a function of MOC stoichiometry fora range of simu- 
lated polyMOCs (Extended Data Fig. 6). We found that for polyMOCs derived from 
small MOCs, for example the o-gel, the difference between f. and Te is almost 
negligible. In contrast, the large number of secondary loops in the c-gel leads toan 
fi. value that is three times larger than fo We postulated that in the c-gel topolog- 
ical state, the high density of polymer chains attached to each Pd 4L4g junction may 
suppress junction fluctuations, which would lead to affine network behaviour. 
Moreover, junction fluctuations in the o-gel state would be best captured by the 
phantom network model*!”. In other words, our photoswitchable polyMOC could 
behave as two fundamentally different networks depending on its topological state. 

This hypothesis was supported by our rheological data. Measured G’ values 
for the o-gel and c-gel were used to calculate fas and (see Supplementary 


le Fuse = 
Information). As shown in Extended Data Fig. 6, f. agrees well with f for the 
= phantom_ ac 

o-gel. In contrast, f .., shows excellent agreement with f for the c-gel. 

Fatigue studies. To study the fatigue properties of polyMOCs over switching 
cycles, we prepared several samples of the o-gel using the sandwiched-assembly 
method, where the circular gel in a Teflon mould was sandwiched between two 
transparent glass slides and annealed. Each o-gel was irradiated alternately with 
UV light and green light for several cycles (from zero to seven cycles) at 60°C (each 
irradiation took 5 h) or 45°C (each irradiation took 12 h). The polyMOC was then 
taken out of the mould and subjected to rheological or SAXS measurements. For 
each data point, averages and standard deviations were computed for three trials. 
To avoid the deformation of samples or solvent loss, samples were not re-used after 
measurements (that is, if seven switching cycles were examined, a total of 15 x 3 
gel samples were made using the sandwiched-assembly method). 

To simulate the fatigue behaviours, we set a certain fraction of the ligand as 
inactive as a starting parameter, then simulated the polyMOC self-assembly process 
and obtained the average branch functionality in the equilibrium stage. Given 
the postulation that the o-gel adopts phantom network behaviour while the c-gel 
adopts affine network behaviour, the simulated branch functionality for the o-gel 
is based on active connections (Extended Data Fig. 7a) and for the c-gel is based 
on total connections (Extended Data Fig. 7b). 

From our study on the small-molecule ligand (Extended Data Fig. 8), about 
80% of the ligand is still active after 18 h of UV irradiation. On the other hand, 
after seven switching cycles, the polyMOC has been exposed to UV irradiation for 
35 h in total. We postulate that the fatigue behaviour of our polyMOCs is mainly 
dictated by UV-induced side reactions of DTE. 

Data availability. The data that support the findings of this study are available 
from the corresponding author on reasonable request. 
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Extended Data Fig. 1 | Solution self-assembly of DTE-based bis- 
pyridine ligand and Pd**. a, DTE-containing bis-pyridine photoswitch 
reversibly interconverts between open (0-L) and closed (c-L) forms. 
Note that the ring-closure reaction produces trans-isomers (racemic). 
In the presence of Pd?*, o-L and c-L form Pd3(0-L)¢ and Pdy4(c-L)ag 
MOCzs, which can be interconverted using light. b, Aromatic regions 

of the solution 'H NMR spectra (CD3CN, 25°C, w/27 = 500 MHz) for 
o-L and c-L, and corresponding MOCs prepared from these ligands and 
Pd(CH3CN)4(BF4)2. Photoswitching steps are indicated by black arrows. 
c, 'H DOSY measurements indicate that Pd?* forms a smaller assembly 
with o-L (green spectrum) and a larger assembly with c-L (blue spectrum). 


600 


1000 


T T T T 
1400 1800 m/z 1200 1600 2000 2400 m/z 


From a series of 'H NMR spectra generated from 'H DOSY measurements, 
the decay of peak intensities was fitted to provide the diffusion value (D) 
for Pd3(0-L), (green shaded region) and Pd24(c-L)4g (blue shaded region), 
respectively. From the measured diffusion values, the hydrodynamic 

radii of Pd3(0-L)¢ and Pd24(c-L)4g were calculated. d, ESI mass spectrum 
of Pd3(o-L),. The charge states of intact assemblies due to the loss of 
counterions are marked. Inset shows the simulated and observed isotopic 
patterns of [Pd3(0-L), + 2BF4 ]**. e, Cold spray ionization mass spectrum 
of Pdy4(c-L)4g. The charge states of intact assemblies due to the loss of 
counterions are marked. 
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Extended Data Fig. 2 | Frequency sweeps in oscillatory rheometry irradiation followed by green-light irradiation. d, Data for three c-gel 
at 0.5% strain. a, Data for three o-gel samples. b, Data for three o-gel samples prepared directly from c-PL. 


samples after UV irradiation. c, Data for three o-gel samples after UV 
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Extended Data Fig. 3 | Fitting of high-q SAXS profile of the c-gel. a, High-q regime of the SAXS profile for the c-gel. Five peaks were identified (dashed 
lines) and indexed. b, Experimental results were fitted with the form factor of a spherical particle with radius 2.9 nm. 
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Extended Data Fig. 4 | Topology switching in the presence of free ligand _ in oscillatory rheometry at 0.5% strain for three o-gel’ samples after UV 
as defects. a, Frequency sweep in oscillatory rheometry at 0.5% strain for irradiation followed by green-light irradiation. d, SAXS curves for the 
three o-gel’ samples. b, Frequency sweep in oscillatory rheometry at 0.5% o-gel’ before and after UV irradiation. 
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Extended Data Fig. 5 | Simulations of network topology. a, Snapshots of are shown in red and blue, respectively. The blue dashed circle shows a 

in silico polyMOCs before UV irradiation. Left: MOC junctions are shown _ representative case in which two Pd»4L4g clusters are connected by multiple 
as grey spheres and polymer chains connecting MOC junctions are shown _ polymer chains (that is, multiple secondary loops). c, A representative 

in blue. Right: A zoom-in view of the region in the green cube in the left polymer network which is abundant in secondary loops. The connectivity 
panel. Looped and non-looped polymer chains are shown in red and blue, _ of each junction is calculated on the basis of the total connections, which 
respectively. b, Snapshots of in silico polyMOCs after UV irradiation. Left, describes the number of polymer chains connecting MOCs. d, The 

MOC junctions are shown as grey spheres, and polymer chains connecting same polymer network is represented in another way by calculating the 
MOC junctions are shown in blue. Right, A zoom-in view of the region in connectivity of each junction on the basis of the active connections. 

the green cube in the left panel. Looped and non-looped polymer chains 
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Extended Data Fig. 6 | Simulated results for average connections per 
MOC for a series of polyMOCs with various Pd,L, stoichiometries. 
Two types of connections between MOCs are defined: total connections 
and active connections. These connections correspond to two classical 
models of elasticity: the affine model (total connections, red curve) and 
the phantom model (active connections, black curve). Both affine (yellow 
stars) and phantom (blue stars) models were used to calculate the 

average branch functionality for the o-gel and c-gel based on measured G’, 


and the results were compared with simulated active connections and 
total connections. For the o-gel, the phantom-model based experimental 
calculation agrees well with simulated active connections; for the c-gel, the 
affine-model based experimental calculation agrees well with simulated 
total connections. The experimental and simulated results suggest that 

the o-gel is best described as a phantom network while the c-gel is best 
described as an affine network. 
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Extended Data Fig. 7 | Simulation studies of fatigue behaviours. a, Simulation results for f. of Pd3L¢ gel obtained by assuming that a certain fraction of 
ligand is inactive. b, Simulation results for f_ of Pd24L4g gel obtained by assuming that a certain fraction of ligand is inactive. 
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Direct arylation of strong aliphatic C-H bonds 


Ian B. Perry), Thomas F. Brewer)’, Patrick J. Sarver!, Danielle M. Schultz*, Daniel A. DiRocco* & David W. C. MacMillan!* 


Despite the widespread success of transition-metal-catalysed 
cross-coupling methodologies, considerable limitations still exist 
in reactions at sp*-hybridized carbon atoms, with most approaches 
relying on prefunctionalized alkylmetal or bromide coupling 
partners)”. Although the use of native functional groups (for 
example, carboxylic acids, alkenes and alcohols) has improved 
the overall efficiency of such transformations by expanding the 
range of potential feedstocks*°, the direct functionalization of 
carbon-hydrogen (C-H) bonds—the most abundant moiety in 
organic molecules—represents a more ideal approach to molecular 
construction. In recent years, an impressive range of reactions 
that form C(sp*)-heteroatom bonds from strong C-H bonds has 
been reported’. Additionally, valuable technologies have been 
developed for the formation of carbon-carbon bonds from the 
corresponding C(sp)-H bonds via substrate-directed transition- 
metal C-H insertion®, undirected C-H insertion by captodative 
rhodium carbenoid complexes’, or hydrogen atom transfer from 
weak, hydridic C-H bonds by electrophilic open-shell species!*4. 
Despite these advances, a mild and general platform for the 
coupling of strong, neutral C(sp?)-H bonds with aryl electrophiles 
has not been realized. Here we describe a protocol for the direct 
C(sp*) arylation of a diverse set of aliphatic, C-H bond-containing 
organic frameworks through the combination of light-driven, 
polyoxometalate-facilitated hydrogen atom transfer and nickel 
catalysis. This dual-catalytic manifold enables the generation of 
carbon-centred radicals from strong, neutral C-H bonds, which 
thereafter act as nucleophiles in nickel-mediated cross-coupling 
with aryl bromides to afford C(sp?)—C(sp”) cross-coupled products. 
This technology enables unprecedented, single-step access to a broad 
array of complex, medicinally relevant molecules directly from 
natural products and chemical feedstocks through functionalization 
at sites that are unreactive under traditional methods. 

Metallaphotoredox catalysis has recently emerged as an effective 
strategy for C(sp*)-H functionalization’>. Specifically, the merger 
of photoredox-mediated hydrogen atom transfer (HAT) and 
transition-metal catalysis has delivered several methods for the selective 
functionalization of activated C-H bonds based on low bond disso- 
ciation energies and/or polarity effects (a-heteroatom, benzylic and 
formyl)'°"“. Inspired by these studies and a strong oxidant-mediated 
protocol for C(sp*)—H arylation’’, we proposed that combining a HAT 
catalyst capable of generating high-energy carbon-centred radicals 
from strong, inert C-H bonds with the elementary steps of nickel cata- 
lysis (aryl oxidative addition, reductive elimination) would enable the 
coupling of aliphatic carbon frameworks with a range of aryl bromide 
coupling partners. 

We proposed that polyoxometalates (POMs), many of which possess 
high-energy excited states able to perform the desired C-H abstraction, 
would be ideal cocatalysts for the proposed transformation”. Of par- 
ticular interest was the decatungstate anion ([W19032]* ), a POM that 
has been broadly used as an efficient HAT photocatalyst in various oxy- 
genations, dehydrogenations, conjugate additions and, more recently, 
fluorinations of strong, unactivated, aliphatic C-H bonds!8-?3 with 
bond dissociation energies of up to 100 kcal mol! (for cyclohexane, 
ref. **). To our knowledge, the decatungstate anion has not previously 


been merged with transition-metal cross-couplings, and we hoped that 
such a combination of catalytic processes would enable access to a con- 
siderable breadth of carbon-centred radicals and aryl-functionalized 
products from abundant feedstocks (Fig. 1). Furthermore, the observed 
selectivity of decatungstate for the abstraction of electron-rich, steri- 
cally accessible C-H bonds” combined with the steric preference of 
nickel-catalysed cross-couplings suggested that our proposed dual- 
catalytic system could provide site-specific arylation of complex organic 
frameworks. 

A detailed description of our proposed mechanism is illustrated in 
Fig. 2. Photoexcitation of tetrabutylammonium decatungstate (TBADT, 1) 
followed by intersystem crossing would produce the triplet excited 
state (2) (with a lifetime, 7, of 55 ns)”°. Subsequent hydrogen atom 


a 
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N H H 
eg H ; 
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oy ae 
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transformation 
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Catalytic 
manifold 


Single-step C-H arylation 
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Fig. 1 | Undirected aliphatic C-H arylation. a, Traditional transition- 
metal-catalysed C(sp*)-H arylation methods rely on adjacent or distal 
functionality to facilitate C-H bond activation. b, This dual-catalytic 
approach involves the combination of light-driven, polyoxometalate- 
facilitated hydrogen atom transfer and nickel catalysis. c, Use of this 
catalytic manifold enables the direct arylation of strong, unactivated C-H 
bonds. Ar, aryl; BDE, bond dissociation energy; Boc, tert-butoxycarbonyl. 
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Fig. 2 | Reaction scheme and proposed mechanism for C(sp*)—H 
arylation via a dual polyoxometalate HAT and nickel catalytic 
manifold. The catalytic cycle begins with photoexcitation of the 
decatungstate anion 1 to provide triplet excited state 2. HAT from 
nucleophile 3 affords reduced photocatalyst 4 and open-shell species 5. 
Disproportionation of the reduced decatungstate species regenerates the 
active photocatalyst and affords the reducing hexa-anion 6. Ni° species 


abstraction from an alkyl nucleophile such as norbornane (3) by excited- 
state decatungstate (2) would readily afford singly reduced decatung- 
state (4) and carbon-centred radical 5. Disproportionation of singly 
reduced decatungstate (4) would regenerate the active HAT photo- 
catalyst 1 and concurrently form doubly reduced decatungstate (6)”°. 
Two successive single-electron reductions of precatalyst Ni(dtbbpy)Br2 
(dtbbpy = 4,4’-di-tert-butyl-2,2'-bipyridine) (E, (Ni®/Ni°) = —1.47 V 
versus Ag/Ag* in acetonitrile, see Supplementary Information) by 
doubly reduced decatungstate (6) (E1/2° (TW 19032]? /[W 19032]°") = 
—1.52 V versus Ag/Ag* in acetonitrile, see oe ene Information) 
could initially afford Ni° species 7, which after capture of alkyl radical 5 
would furnish Ni'-alkyl species 8. Subsequent oxidative addition into aryl 
halide 9 by Ni’-alkyl species 8 would afford Ni™(aryl)(alkyl) species 10. 
Reductive elimination would provide the desired cross-coupled product 
11 as well as Ni’ species 12. A final single-electron transfer step between 
this Ni’ species and the doubly reduced polyoxometalate 6 would regen- 
erate the active Ni° catalyst 7, as well as singly reduced TBADT (4), 
closing both catalytic cycles. An alternative mechanism involving 
the oxidative addition of Ni° catalyst 7 to aryl halide 9 could also be 
operative’’. 

We began our investigation into the proposed transformation 
by exposing 5-bromo-2-trifluoromethylpyridine and cyclohexane 
to near-ultraviolet light (Kessil 34 W 390 nm light-emitting diodes 
(LEDs)) in the presence of the commercially available HAT photo- 
catalyst TBADT, Ni(dtbbpy)Br., and potassium phosphate in aceto- 
nitrile. To our delight, we observed a 70% analytical yield of the desired 
cyclohexyl C-H arylation product. Critical to the success of the reac- 
tion was the exclusion of both oxygen and water (see Supplementary 
Information); however, the use of standard benchtop techniques was 
sufficient in this regard. Moreover, although five equivalents of the 
C-H nucleophile affords optimal yields, lower substrate loadings 
can be used albeit with diminished efficiency (see Supplementary 
Information). 

With optimized conditions in hand, we next sought to examine the 
scope of the transformation with respect to the C-H-bearing partner. 


34 W 390 nm Kessil lamp, fan 


Arylated product 11 #: Broadiscope: 


7 subsequently captures alkyl radical 5, furnishing Ni'-alkyl species 8. 
Oxidative addition into aryl electrophile 9 provides Ni" species 10, 
which undergoes reductive elimination to afford the product (11) and 
Ni'-Br species 12. Single electron transfer (SET) between 6 and 12 
regenerates 7 and 4, closing both catalytic cycles. Alk, alkyl; dtbbpy, 
4,4'-di-tert-butyl-2,2'-bipyridine; equiv., equivalents; L, ligand; TBADT, 
tetrabutylammonium decatungstate; t-Bu, tert-butyl. 


As shown in Fig. 3, a diverse array of organic frameworks proved 
to be competent coupling partners for the C-H arylation protocol. 
Cycloalkanes with various ring sizes ranging from five to eight carbons 
were arylated in good yields (13-16, 57%-70% yield). Linear aliphatic 
systems were likewise successful in the protocol (17-20, 41%-56% 
yield), with a greater-than-statistical preference observed for arylation 
of the less sterically demanding 2-position for all substrates, including 
n-hexane (SI-1, 48% yield, 60% selectivity). Electron-withdrawing sub- 
stituents further improved this regiocontrol, highlighting the selectivity 
of decatungstate for more hydridic C-H bonds imparted by the elec- 
trophilic nature of its excited state*®. Accordingly, we found ketones to 
be particularly effective in modulating regioselectivity, affording prod- 
ucts that are functionalized distal to the electron-withdrawing carbonyl 
moiety (21, 22, 31-35, 31%-65% yield). 

This C-H arylation protocol was also found to effectively function- 
alize a range of electronically diverse primary and secondary benzylic 
C-H bonds, which were arylated in moderate to good yields (23-25, 
62%-71% yield, see Supplementary Information for three additional 
examples). Bridged bicyclic alkanes afforded arylated products with 
complete exo-selectivity (26-29 and 35, 40%-67% yield), probably 
owing to selective radical capture by nickel catalyst 7 on the less- 
hindered face. Functionalization of norbornane occurred selectively 
on the ethylene bridge (26, 61% yield). A bromide substituent on 
the bridging methylene of norbornane was tolerated and, moreover, 
strongly influenced site-selectivity, giving only the anti product (27, 
67% yield). Notably, heteroatom-containing bicycles afforded the 
desired products in moderate to good yields (28 and 29, 40% and 60% 
yield, respectively). Arylated lactam 29 was subsequently subjected to 
ring-opening reductive conditions to afford carbocyclic nucleoside 
analogue 30 (94% yield), highlighting the utility of the C-H arylation 
protocol. Intriguingly, adamantane derivatives underwent arylation 
predominantly at the methylene position (31 and 32, 48% and 53% 
yield, respectively), an unexpected chemoselectivity given that 
decatungstate-catalysed adamantane functionalization affords 5:1 
selectivity for methine positions when corrected for equivalent 
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Fig. 3 | Scope of the alkyl nucleophile coupling partner. A broad range 

of C-H nucleophiles are selectively functionalized by this arylation 
protocol. Cyclic, acyclic and bicyclic aliphatic systems are amenable 
substrates. Heteroatom and carbonyl substituents electronically influence 
regioselectivity, and alkyl halides remain intact. All yields are isolated yields. 
Conditions as in Fig. 2. Green circles denote sites where notable amounts 


hydrogen atoms”. This result further highlights the role of the nickel 
catalyst in determining the regioselectivity of C-C bond formation, 
presumably via reversible radical capture and selectivity-determining 
reductive elimination’’. Four-membered rings were also competent 
substrates for this arylation protocol, with both an exocyclic ketone and 
a spirocyclic ketone affording the desired product in moderate yields 
(33 and 34, 42% and 31% yield, respectively). Tropinone, a common 
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of other regioisomers are observed. See Supplementary Information 

for experimental details. Ac, acetyl; Boc, tert-butoxycarbony]; d.r., 
diastereomeric ratio; Me, methy]; r.r., regioisomeric ratio. *>20:1 rr; 570% 
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scaffold among natural products and pharmaceuticals”, was also effec- 
tively subjected to this dual-catalysis protocol (35, 61% yield). 

It is important to note that this transformation is not restricted 
to electronically neutral, unactivated C—H systems. Indeed, various 
a-heteroatom C-H nucleophiles were readily modified with excellent 
regioselectivity. As follows from the preference of decatungstate for the 
most hydridic and sterically accessible C-H bond, tert-butoxycarbonyl 
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Fig. 4 | Scope of the aryl halide coupling partner. Various electronically 
diverse aryl bromides were functionalized in moderate to good yields, 
and unprotected polar functionalities such as alcohols and sulfonamides 
were well tolerated. Moreover, heteroaryl bromides including indoles, 
pyridines, pyrimidines and thiazoles were competent coupling partners in 


(Boc)-protected pyrrolidine was functionalized selectively at the 
a-amino position (36, 53% yield). Primary a-amino C-H nucleophile 
N-Boc dimethylamine was also found to be an effective substrate for 
the transformation (37, 68% yield). In addition to nitrogen-containing 
nucleophiles, various cyclic ethers were regioselectively functionalized 
in moderate to good yield at the a-oxy position (38-43, 48%-70% 
yield). Notable among these substrates, alkyl halides were well toler- 
ated (41 and 42, 50 and 70% yield, respectively), opening avenues for 
subsequent synthetic manipulations. N-Boc-morpholine underwent 
C-H arylation predominantly at the a-amino C-H bond (43, 48% 
yield, 3.4:1 regioisomeric ratio (r.r.)). Useful amounts of the a-oxy 
product are generated in this case, in contrast to the quinuclidine- 
mediated triple catalytic arylation reported previously by our 
laboratory'® (see Supplementary Information) as well as benzophenone- 
mediated cyanation*®, wherein exclusive «-amino functionalization 
is observed. 

We next turned our attention to the scope of the aryl halide coupling 
partner. As shown in Fig. 4, a broad range of electron-deficient aryl bro- 
mides provided the desired products in good yield (44-49, 60%-70% 
yield). Furthermore, neutral and electron-rich substrates displayed 
useful coupling efficiencies (50-54, 52%-62% yield). Chlorine- and 


the transformation. Finally, the dual catalytic manifold was applied to the 
synthesis of several analogues of celecoxib. All yields are isolated yields. 
Conditions as in Fig. 2. See Supplementary Information for experimental 
details. “1.4:1 rr5°>20:1 dr. 


fluorine-bearing aryl bromides were alkylated selectively as well (55 
and SI-5, 50% and 55% yield, respectively), and free-alcohol-containing 
substrate 56 was also found to be a competent coupling partner (55% 
yield). ortho-Substituted aryl bromides were likewise alkylated in mod- 
erate to good yields (57 and 58, 45% and 71% yield, respectively). With 
respect to heteroaryl bromides, N-Boc-indole 59 underwent the desired 
transformation in useful efficiency (38% yield). A range of bromopyri- 
dines were alkylated in useful to good yields as well (60-69, 25%-64% 
yield). Bromopyrimidines were effective substrates (70 and 71, 55% and 
51% yield, respectively), and both electron-rich and electron-deficient 
2-bromothiazoles afforded the desired product in moderate yields (72 
and 73, 54% and 51% yield, respectively). Lastly, the pharmaceutically 
relevant aryl halide 74, a precursor to celecoxib was subjected to the 
reaction conditions with various alkyl C-H nucleophiles. Cyclohexane, 
cyclohexanone and 7-bromonorbornane were all coupled in good effi- 
ciencies (75a-c, 60%-67% yield), which demonstrates the utility of the 
protocol in cross-coupling complex aryl fragments with structurally 
diverse C-H nucleophiles. 

Having demonstrated the applicability of the C-H arylation protocol 
to a broad array of C-H nucleophiles and aryl halide electrophiles, we 
next investigated its efficacy on naturally occurring aliphatic systems. 
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As illustrated in Fig. 5a, various inexpensive, abundant natural prod- 
ucts were successfully functionalized under our standard conditions, 
enabling the rapid arylation of complex stereodefined frameworks 
at carbon sites that lack adaptive functional handles. The observed 
regioselectivities were in accordance with the expected preferences 
of this dual-catalytic manifold, as described above. A moderate yield 
of heteroarene-coupled eucalyptol was observed (76, 55% yield), 
with a strong preference for arylation at the most hydridic 
and sterically accessible C-H bond. Useful efficiencies were observed 
for the terpene fenchone, which was also found to be a suitable 
substrate on 5.0-mmol scale (77, 38% and 41% yield, respectively; 
see Supplementary Information for experimental details). A free- 
alcohol derivative of fenchone was also readily used in this protocol 
(78, 52% yield), and camphene, a terminal-olefin-containing natural 
product, was also an effective substrate for arylation (79, 70% yield). 
Lastly, we sought to illustrate the generality of this method by derivat- 
izing the lactone sclareolide with a range of aryl and heteroaryl bro- 
mides (80a—c, 35%-43% yield). Notably, an alkyl acid chloride provided 
the sclareolide-derived ketone in useful efficiency (80d, 33% yield), 
which illustrates the capability of our transformation to install a range 
of functionality onto complex aliphatic substrates without the need for 
directing groups. 

Finally, we demonstrated the application of this C-H arylation proto- 
col towards the rapid generation of complex pharmaceutically relevant 
molecules. Our target was the natural product epibatidine, a potent 
non-opioid analgesic. Owing to its high toxicity, epibatidine has seen 
limited potential as a commercial pharmaceutical? |. however, a range 
of epibatidine analogues have been investigated in a clinical setting”. 
We first targeted the synthesis of (-)-N-Boc-epibatidine from com- 
mercially available 7-azabicyclo[2.2.1]heptane. When N-Boc-protected 
amine substrate 81 and 2-chloro-5-bromopyridine were subjected to 
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the reaction conditions, we observed an unoptimized 28% yield of 
protected epibatidine (82a) in two steps from the commercially avail- 
able unprotected amine (Fig. 5b, see Supplementary Information for 
experimental details). To our knowledge, this is the shortest formal 
synthesis of (+)-epibatidine in the multitude of reported procedures 
to date*’. Subsequently, we sought to demonstrate that diversification 
was possible by variation of both the alkyl fragment and the aryl bro- 
mide fragment. A representative sampling of heteroaryl bromides was 
coupled with bridged bicyclic amines to afford a small set of analogues 
in synthetically useful yields (82b to 83c, 21%-44% yield). We have 
developed a robust method for the construction of C(sp> )-C(sp’) bonds 
from alkane nucleophiles and aryl bromide electrophiles. We believe 
that these results demonstrate the potential to use unactivated C-H 
bonds as nucleophiles in transition-metal-catalysed cross-coupling 
transformations. 
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The past two million years of eastern African climate variability 
is currently poorly constrained, despite interest in understanding 
its assumed role in early human evolution!. Rare palaeoclimate 
records from northeastern Africa suggest progressively drier 
conditions~» or a stable hydroclimate®. By contrast, records from 
Lake Malawi in tropical southeastern Africa reveal a trend of a 
progressively wetter climate over the past 1.3 million years”*. The 
climatic forcings that controlled these past hydrological changes 
are also a matter of debate. Some studies suggest a dominant local 
insolation forcing on hydrological changes®"!!, whereas others 
infer a potential influence of sea surface temperature changes in 
the Indian Ocean*!”!, Here we show that the hydroclimate in 
southeastern Africa (20-25° S) is controlled by interplay between 
low-latitude insolation forcing (precession and eccentricity) and 
changes in ice volume at high latitudes. Our results are based on 
a multiple-proxy reconstruction of hydrological changes in the 
Limpopo River catchment, combined with a reconstruction of sea 
surface temperature in the southwestern Indian Ocean for the past 
2.14 million years. We find a long-term aridification in the Limpopo 
catchment between around 1 and 0.6 million years ago, opposite to 
the hydroclimatic evolution suggested by records from Lake Malawi. 
Our results, together with evidence of wetting at Lake Malawi, 
imply that the rainbelt contracted toward the Equator in response 
to increased ice volume at high latitudes. By reducing the extent 
of woodland or wetlands in terrestrial ecosystems, the observed 
changes in the hydroclimate of southeastern Africa—both in terms 
of its long-term state and marked precessional variability—could 
have had a role in the evolution of early hominins, particularly in 
the extinction of Paranthropus robustus. 

Subtropical southeastern Africa is a region of critical interest because 
it contains hominin fossils that enable a comparison between continental 
indicators of hominin evolution and nearby marine records of past 
climate changes. Different modes of climate change have previously 
been proposed as major factors that influenced hominin speciation, 
adaptation or extinction. Some authors stress the effect of long-term 
trends towards aridity on hominin evolution” whereas others suggest 
a crucial role of short periods of extreme climatic variability in driving 
hominin evolution**. 

P. robustus fossils have been found only in southeastern Africa and 
exclusively in the Limpopo River catchment, at the sites of Cooper’s 
Cave D, Drimolen, Swartkrans, Sterkfontein, Kromdraai B and 
Gondolin, (Fig. 1) which date from at least about 2 million years ago 
(Ma) to 0.9 Ma (Extended Data Table 1). It is currently unclear whether 
climate stress could have had a role in the extinction of this species. 


To investigate the hydroclimatic context of the environment in which 
P. robustus lived, we reconstructed changes in the Quaternary hydro- 
logical cycle in subtropical southeastern Africa (20-25° S) to deter- 
mine the drivers of variability and to identify the long-term climate 
evolution of this region. We used marine sediment core MD96-2048 
(26° 10’ 482” S, 34° 01 148” E, 660-m water depth) from offshore of 
the Limpopo River mouth (Fig. 1). The chronology of MD96-2048 
is established by tuning the &'%O benthic foraminifera signal to the 
reference LR04 stack (Fig. 2, Methods), which confirms that the core 
covers the past 2.14 million years (Myr). We present a multiple-proxy 
record of hydrological changes in the Limpopo catchment together 
with a sea surface temperature (SST) record of the southwestern Indian 
Ocean (Fig. 2). 

Modern precipitation in the Limpopo catchment is dominated 
by austral summer rainfall associated with the extension of the 
Intertropical Convergence Zone (ITCZ) southwards to 15-20° S 
(Fig. 1). Changes in the hydrological cycle in the catchment are 
imprinted on discharge from the Limpopo River. We observe large 
changes in sedimentary elemental ratios of terrestrial iron to marine 
calcium, which indicate changes in terrestrial discharge by the Limpopo 
River at orbital and longer timescales (Fig. 2, Extended Data Fig. 1). 
Maxima in In(Fe/Ca) ratios are associated with a more-depleted stable 
hydrogen isotope composition of plant waxes (6Dwax) (Fig. 2), which 
reflects the isotopic composition of precipitation and is indicative 
of a higher amount of regional rainfall'®!*!* (Extended Data Fig. 2, 
Methods). Maxima in In(Fe/Ca) ratios are also associated with 
maxima in concentrations of branched glycerol dialkyl glycerol tetra- 
ethers (brGDGT), which are commonly found in soils and attributed 
to Limpopo River runoff'*, and more enriched plant-wax 6¥C. A 
previous study of the past 0.8 Myr of core MD96-2048 interpreted shifts 
towards more-depleted 6!3C,,,x as potentially reflecting more-humid 
conditions!®, However, §'?Cwax is a proxy for the contribution of waxes 
from C3 versus Cy, plants to the sediments, which can be influenced by 
many other factors than aridity and humidity. We attribute the enriched 
plant 63C,,.. at times of depleted &D,,.x values, and increased In(Fe/Ca) 
and concentrations of brGDGT, to stronger transport of C4 plant mate- 
rial from the upper Limpopo catchment, in addition to the extension 
of riverine swamps and floodplains that contain abundant Cy, sedges 
(Fig. 2, Extended Data Figs. 3, 4). 

The In(Fe/Ca) ratio has the highest temporal resolution—300 years 
on average—of the proxies that we used. Statistical analyses indicate 
significant 19- and 23-thousand-year (kyr) cycles (precession) and 
100-kyr and 400-kyr cycles (eccentricity) but no significant 41-kyr cycle 
(obliquity) (Extended Data Fig. 5). The dominance of eccentricity and 
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Fig. 1 | Modern climatology over southern Africa and vegetation types 
in the Limpopo catchment. a, Averaged precipitation rates for January*® 
and annual SST over the Indian Ocean”. Black arrows represent the 
atmospheric circulation over southern Africa during austral summer. 
The ITCZ and the Congo air boundary (CAB) are indicated. b, Modelled 
relative Cy plant abundance in the Limpopo catchment*? with indications 
of topography (from SRTM, Shuttle Radar Topography Mission; https:// 


precession cycles indicates a strong influence of low-latitude insolation 
on hydrological changes in the Limpopo catchment. Rainfall and 
precession maxima are in phase (that is, maxima of local insolation in 
the Southern Hemisphere) (Extended Data Figs. 3, 5). 

Modern-day rainfall in the subtropical Limpopo region depends 
mainly on easterly waves and low-pressure cells that largely control 
summer rainfall (November to March, 72% of the rainfall in Pretoria; 
Extended Data Fig. 2), and on tropical-extratropical cloud bands 
and associated thunderstorms!’. Convective rains are usually asso- 
ciated with the ITCZ and warm, humid easterly winds’’”. Numerical 
model experiments suggest southern summer insolation forcing 
exerts a strong and positive effect on monsoon rainfall'>'®. During 
precession maxima, higher levels of Southern Hemisphere summer 
insolation cause higher temperatures and lower surface pressure over 
the Southern Hemisphere, in particular over land!!. The contrast 
between land and ocean temperatures results in stronger easterly 
moisture inflow into southeastern Africa. Increased rainfall results 
from increased convection over, and increased humidity transport 
into, southeastern Africa!!. Because eccentricity modulates precession 
amplitudes, the increased summer insolation associated with eccen- 
tricity increases the variability in rainfall and fluvial discharge in the 
Limpopo catchment (Fig. 3). 

In addition to this, it is thought that SST anomalies in the Indian 
Ocean have an influence on summer rainfall in the region’®. To explore 
the potential relationship between hydrological cycle changes and 
oceanic conditions, we reconstructed SSTs for the southwestern Indian 
Ocean using two different methods (Extended Data Fig. 6, Methods). 
There is a significant correlation between SST and orbital parameters 
at the 100-kyr cycle (glacial-interglacial periodicity) and the 41-kyr 
cycle (obliquity) (Extended Data Fig. 5). The results confirm a previous 
study that revealed the absence of significant precessional variability in 
the SST record over the past 0.8 Myr’. This suggests that orbital-scale 
precipitation changes in southeastern Africa are more closely related to 
the contrast between land and ocean temperatures than to SST changes. 

Superimposed on the orbital-scale changes, our record displays a 
long-term trend towards more arid conditions in southeastern Africa 
between about 1 and 0.6 Ma (Figs. 2, 3b, c). This period corresponds 
to the mid-Pleistocene transition (MPT), which is marked by ice-sheet 
expansion and global SST decrease” (Fig. 2). In terms of hydrological 
changes, the record from Lake Malawi that covers the past 1.3 Myr 
has previously been interpreted”* to show changes that are opposite 
to those that we observe for the Limpopo catchment. At Lake Malawi, 
the climate changed from a predominantly arid environment between 
1.3 and about 1 Ma to generally wetter conditions after about 1 Ma”*. 
This opposing pattern in hydrological changes between the Limpopo 


www2.jpl.nasa.gov/srtm/) and bathymetry (from GEBCO, General 
Bathymetric Chart of the Oceans; https://www.gebco.net). Location of core 
MD96-2048, the Lake Malawi records, the main sites of hominin finds for 
P. robustus (sites of Cooper’s Cave D, Drimolen, Swartkrans, Sterkfontein, 
Kromdraai B and Gondolin) and A. sediba (Malapa) (Extended Data 

Table 1) and the Pretoria Global Network of Isotopes in Precipitation 
station are indicated. 


catchment and Lake Malawi suggests a gradual contraction of tropical 
rainfall from the Limpopo catchment towards lower latitudes in 
response to the ice sheet expansion during the MPT. This rainfall shift 
could be related to increased Antarctic ice volume during the MPT”). 

The long-term trend towards wetter conditions at Lake Malawi has 
been explained by a progressively less-positive Indian Ocean Dipole 
since approximately 1 Ma. The Indian Ocean Dipole can enhance or 
reduce the precessional variability in the low-latitude hydroclimate by 
modifying the Walker circulation over the Indian Ocean, with a diverse 
response in eastern Africa”*. A progressively less-positive Indian Ocean 
Dipole would have generated wetter conditions in southern Africa 
and increased the precession signal. Although the precessional signal 
increased over time in the Limpopo record, the observed progressive 
increase in aridity is contrary to what would be expected from Indian 
Ocean Dipole forcing alone (Fig. 3). 

On the basis of our new records and comparisons with published 
records for Lake Malawi, we propose that low-latitude insolation forcing 
(precession and eccentricity) and changes in ice volume at high 
latitudes were the main drivers of the southeastern African hydroclimate 
over the past 2 Myr and that SST forcing had a secondary role. Our 
results also highlight a large regional variability in the southeastern 
African hydroclimate. 

Changes in the hydrological cycle in southeastern Africa were prob- 
ably one of multiple factors that influenced the dispersal and evolution 
of human relatives”*. The more-humid conditions observed between 
about 2 Ma and 1.75 Ma, associated with a maximum in eccentric- 
ity forcing, correspond to several occurrences of P robustus (Fig. 3, 
Extended Data Table 1). P robustus is a species that was tolerant of envi- 
ronmental variability (eurytopic), but multiple lines of evidence suggest 
that from its earliest to its last occurrence this species preferred the 
wooded or humid components—dominated by C; plants—of environ- 
ments that were otherwise dominated by Cy, dry-adapted plants. This 
preference for habitats dominated by C; plants is well-corroborated 
by the data from palaeoecological studies of other contemporaneous 
animals that indicate that large quantities of Cy vegetation were available, 
but that a more-wooded component and water sources were also 
present”! (Fig. 3f, Extended Data Table 2, Methods). This more- 
humid period between about 2 Ma and 1.75 Ma is also characterized 
by the presence of Australopithecus sediba in the Limpopo catchment 
(so far known only from the Malapa site)”. Multiple lines of evidence 
suggest A. sediba also lived in a wooded or humid habitat and had a diet 
dominated by C; plants, within an otherwise rather-open environment 
dominated by C, plants”. 

Our data raise the possibility that increasing long-term aridity asso- 
ciated with multi-millennial-scale changes after 1 Ma, driven by the 
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Fig. 2 | Hydrological changes in the Limpopo catchment compared to 
SSTs of the southwestern Indian Ocean over the past 2.14 Myr. 

a, 8D of the n-C3, alkane of plant waxes. Mean analytical uncertainty of 
3%o is indicated (n= 8). b, 53C of the n-C3; alkane of plant waxes. Data 
from a previous study’® are shown in light green, and data from this study 
are shown in dark green. Mean analytical uncertainties of 0.2%o (n= 15) 
and average reproducibility of 0.4%o’® are indicated. c, Pollen percentages 
of Cyperaceae (data from the past 342 kyr from a previous study'). Error 
bars represent 95% confidence intervals. d, In(Fe/Ca) X-ray fluorescence 
(XRF) ratios. Arrow indicates the long-term trend discussed in the text. 
Grey frames represent events of hydrological cycle intensification. 

e, Principal component of the SST records (Methods). f, 5!8O of benthic 
foraminifera compared to the reference LR04 curve (data from the past 
790 kyr from a previously study’) (Methods). The MPT is indicated. 


MPT (Fig. 3b-d), could have diminished the wooded and/or humid 
component of the habitat preferred by P. robustus. This is consistant 
with a trend towards more open and drier landscapes at Swartkrans 
(Fig. 3f, Extended Data Table 2, Methods). 

It has previously been proposed that extinctions of large mammals 
are caused mainly by abiotic environmental changes”®. We propose 
as a speculative, but plausible, scenario that the geographic ranges 
of species that preferred wooded and humid habitats—including 
P. robustus—would have contracted and expanded according to pre- 
cessional (approximately 21-kyr) dry and wet cycles. During the multi- 
millennial dry periods, population ranges of these species would have 
contracted and often become fragmented. These isolated populations 
would have been especially prone to local extinction through lack 
of sufficient suitable food, water and shelter, and related increased 
competition and predation. During multi-millennial wet periods 
associated with precessional maxima, preferred woodland and 
humid habitats would have expanded again. The surviving popu- 
lations would have expanded into their previously occupied range, 
replacing locally extinct populations. The long-term trend toward 
increased aridity implies that the dry periods became more and 
more pronounced between 1 and 0.6 Ma (Fig. 3b, d), increasing 
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Fig. 3 | Forcings on the hydrological cycle changes in the Limpopo 
catchment and relationship with hominin evolution over the past 

2.14 Myr. a, Eccentricity and precession index*'. b, In(Fe/Ca) XRF ratios 
as a proxy for hydrological changes in the Limpopo catchment. Red curve 
denotes a polynomial fit (9th degree). The MPT and associated ice-sheet 
expansion is indicated (orange shading)*””!. Grey shading indicates wetter 
conditions in the Limpopo catchment, associated with higher eccentricity. 
c, Cumulative sum of In(Fe/Ca) showing deviation from the mean as 
indicator of hydrological variability in the Limpopo catchment (Methods) 
(red curves correspond to dry periods, yellow curves to drying periods, 
dark blue curves to wet periods and light blue curves to humidification 
periods). d, Precession component of the In(Fe/Ca) ratios obtained by 
Gaussian filtering. e, Estimated ages for the main sites that yielded remains 
of the hominins P. robustus (Cooper’s Cave D, Drimolen, Swartkrans and 
Gondolin) and A. sediba (Malapa) (Extended Data Table 1). f, Enamel 
8'3C (%o Vienna Pee Dee Belemnite) of hominins and contemporaneous 
herbivores at Swartkrans (Extended Data Table 2). Triangles indicate raw 
data points. The box plots correspond to the median (horizontal line), the 
interquartile range (box) and the full range of data (vertical whiskers). 
Sample sizes are indicated by numbers below the plots. Dashed lines 
highlight the thresholds used to estimate the percentage of C3 plants- 
derived foods in the diets. 


the likelihood for more numerous extinctions of local populations 
until the last-remaining population—therefore the species—went 
extinct?’. Given that the C; component of the vegetation preferred by 
P. robustus was never dominant in the landscape, even during humid 
periods, populations of this species would have been especially prone 
to local extinction during dry periods. 
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Both the long-term state of aridification and the extreme precessional 
variability in hydroclimate (Fig. 3) could therefore have contributed to 
the extinction of P. robustus. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0309-6. 
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METHODS 


XRF measurements. Element intensities were measured using a XRF-Avaatech 
core scanner at EPOC. Before analysis, the sediment surface was flattened and 
covered with Ultralene film. The core sections were scanned at a 0.5-cm resolution 
at two different levels of energy (10 and 30 keV). 

To easily identify periods in XRF In(Fe/Ca) record, we computed the cumulative 

sum (cusum) of the deviations from the mean: cusum = sum(In(Fe/Ca) — (mean(l- 
n(Fe/Ca)). The cumulative sum method was developed for industrial control to 
detect changes in sequential production*”. More recently, the cumulative sum has 
been extensively applied in biological oceanography (for example, ref. **) following 
ref. *4. We first confirmed the regularity of our time series—no periods with greater 
sampling effort; the average sampling effort was 283 years with only 8% of scattered 
time gaps greater than 500 years—and ensured that our periodicity matched that of 
regularized time series at 500-year and 1,000-year windows. The cumulative sum 
shows periods (that is, linear sequences) and their value with respect to the long- 
term average: a positive slope shows a period of values greater than the long-term 
average, and a negative slope shows a period of values smaller than the long-term 
average. The steepness of the slope reflects how different a period is from the long- 
term average. Furthermore, changes in the tendency (that is, sequential periodical 
changes in the slope) reflect periodical changes from a set of conditions to another 
(for example, a change from a positive slope to a flat slope in the XRF In(Fe/Ca) 
record reflects a change from a humid period to a drier period). In the case of the 
1 -0.6-Ma interval, this corresponds to a period of aridification because it is more 
arid than the previous period between 1.2 Ma and 1 Ma (that is, there is a change 
of slope from steeply positive to slightly negative). In addition, the period between 
1 Ma and 0.6 Ma is very variable, because the slope is noisy rather than straight. 
The cumulative sum can be found under the function name local.trend, in the R 
package Pastecs (package of analysis of space-time ecological series). 
Plant-wax 5D and 51°C. Plant-wax analyses were carried out at MARUM. Samples 
were oven dried at 40°C, homogenized and squalane was added as an internal 
standard before extraction. Lipids were extracted with a DIONEX accelerated 
colvent extractor using a 9:1 mixture of dichloromethane to methanol at 100°C 
and 1,000 p.s.i. for five minutes, repeated three times. The saturated hydrocarbon 
fraction was obtained by elution of the dried lipid extract over a silica column with 
hexane and subsequent elution over AgNO3-coated silica to remove unsaturated 
hydrocarbons. 

Compound-specific stable carbon isotope (§'°C) analyses were carried out 
using a Thermo Fisher Scientific Trace GC Ultra coupled to a Finnigan MAT 252 
isotope-ratio-monitoring mass spectrometer via a combustion interface operated 
at 1,000°C. Isotope values were calibrated against external CO} reference gas and 
are reported as parts per thousand (%o) against the Vienna Pee Dee Belemnite 
(V-PDB) standard. Samples were run in at least duplicate. The internal standards 
yielded a precision of < 0.3%o. Repeated analysis of an external n-alkane standard 
between samples yielded a root-mean-squared accuracy of 0.1%o and a standard 
deviation of on average 0.2%o. 

When possible, given the high amount of lipids necessary, compound- 
specific stable hydrogen isotope (SD) compositions were measured using a Thermo 
Fisher Scientific Trace GC coupled via a pyrolysis reactor operated at 1,420°C 
to a Thermo Fisher MAT 253 isotope-ratio mass spectrometer. 5D values were 
calibrated against external H) reference gas, the 3H+ factor was monitored daily 
(values vary between 6.7 and 6.9); 8D values are reported in parts per thousand 
(%o) versus the Vienna Standard of Mean Ocean Water (VSMOW) standard. The 
internal standards yielded a precision of 2%o on average. Repeated analysis of an 
external n-alkane standard between samples yielded a root-mean-squared accuracy 
of <1%o and a standard deviation of on average 3%o. 

The results of a method to adjust the Dax record for vegetation and ice-volume 
changes are given in Extended Data Fig. 4. 

Pollen preparation. Samples of 1.5 to 8.5 ml were prepared at MARUM. The 
volume was measured using water displacement. Samples were decalcified with 
diluted HCl (~12%) and treated with HF (~40%) to remove silicates. Samples 
were sieved over a screen to remove particles smaller than 10-12 jum. When 
necessary the sample was decanted to remove remaining silt. Samples were stored 
in water, mounted in glycerol and microscopically examined (magnification 400 x 
and 1,000) for pollen and spores. Cyperaceae (sedges) pollen percentages were 
calculated based on the total number of pollen and spores, ranging from 53 to 365, 
and 95% confidence intervals were calculated according to a previously published 
method*. 

5'80 analyses on foraminifera. Specimens of benthic Planulina wuellerstorfi 
foraminifera were picked from the 250-315-\1m size fraction. Analyses were 
carried out by a coupled system Multiprep-Optima of Micromass at EPOC. The 
automated preparation system (Multiprep) transforms carbonate samples into COz 
gas by treatment with orthophosphoric acid at a constant temperature of 75°C. The 
CO, gas samples were then analysed by isotope mass spectrometry (Optima) in 
comparison with a calibrated reference gas to determine the isotopic ratio *©0/1°O 


of the sample. For all stable oxygen isotope measurements a working standard 
(Burgbrohl CO) gas) was used, which was calibrated against the Vienna Pee Dee 
Belemnite (V-PDB) standard by using the NBS 19 standard. Analytical standard 
deviation is about 0.05%o (+10). 

The chronology of the core was established by tuning the 6'8O benthic 

foraminifera signal to the reference LR04 stack** with the AnalySeries software*” 
and yielded a correlation coefficient of R=0.8 for the past 2.14 Myr. The core is 
about 36 m long and the sedimentation rate has a mean value of 2 cm per kyr 
(7 =0.91) and is relatively constant. 
SSTs reconstruction. Globigerinoides ruber sensu stricto were picked within the 
250-315- 1m size fraction for trace element analyses. Shells were cleaned at EPOC 
to eliminate contamination by clays and organic matter based on a previously 
published procedure**. An Agilent inductively coupled plasma optical emission 
spectrometer (ICP-OES) was used for magnesium and calcium analyses following 
a previously established procedure®’. Reproducibility obtained from G. ruber s. s. 
on 80 samples from the complete core was better than 6% (+10, pooled standard 
deviation). All new analyses for this study (1 = 217) were performed at EPOC. 
Measured Mg/Ca ratios were converted into temperature values applying a previ- 
ously established equation”, yielding a precision of 1.2°C. 

Total assemblages of planktonic foraminifera were analysed at EPOC using 
an Olympus SZH10 binocular microscope following previously published 
taxonomy)”, About 300 specimens were counted in each level after splitting with 
an Otto microsplitter. Relative abundances of species were used to perform quan- 
tification of SST after an ecological transfer function*® developed at EPOC. The 
method used here is based on the modern analogue techniques“ running under 
the R software, using the ReconstMAT script developed by J. Guiot (BIOINDIC 
package, https://www.eccorev.fr/spip.php?article389). The modern database used 
is composed of 367 core tops and derived from core tops covering the southern 
Indian Ocean** in the MARGO project. Calculations of past hydrological para- 
meters rely on a weighted average of SST values from the best five modern 
analogues, with a maximum weight given to the closest analogue in terms of 
statistical distance (that is, dissimilarity minimum)***. This method permits the 
reconstruction of annual SST with a precision of 0.8 °C (Extended Data Fig. 7). 

As each proxy has some uncertainty related to the calibration, non-temperature 
influences and lateral advection’, we applied empirical orthogonal function 
analysis” to the two SST records for the past 2.14 Myr (Extended Data Fig. 6). The 
first principal component (PC1) explains 74% of the total variance for the Mg/Ca 
and foraminifera transfer function records over the past 2.14 Myr. The correlation 
between SST proxies and PC] over the past 2.14 Myr is R=0.71. 

Climate modelling. To investigate the control on the past SD composition of pre- 
cipitation in the Limpopo catchment, we analysed the results ofa transient run with 
the intermediate complexity isotope-enabled climate model iLOVECLIM** over 
the past 150 kyr*!. The atmospheric part of the coupled climate model was run at 
T21 spatial resolution®! (~5.65° latitude and longitude) and used accelerated forcing 
(irradiance, greenhouse gases and ice sheets were updated with an acceleration 
factor 10). Intermediate complexity models experience some weaknesses caused 
by the spatial resolution and simplified convective physics but have the advantage 
of efficient computation. i,OVECLIM was previously successfully applied in the 
Asian monsoon region*! and in the West African monsoon region™ to investigate 
past monsoonal precipitation changes and their links with changes in the isotopic 
composition of precipitation. For the current study, analysis of the present-day 
performance of the iLOVECLIM model for the region, together with results over 
the past 150 kyr, are shown in Extended Data Fig. 2. 

Ecology and environment of the hominins P. robustus and A. sediba in the 
Limpopo catchment. Building on previous research****, we produce a short syn- 
thesis of multiple lines of evidence to reconstruct the ecology and environmental 
context of the southern African robust australopith P. robustus. We also briefly 
discuss A. sediba. Much literature has been devoted to the ecology of the robust 
australopiths, and particularly to the ecological differences between P. robustus and 
P boisei, another species that is found in eastern Africa. Given that our geographic 
scope is centred on South Africa, we here focus mostly on P. robustus. It remains 
a matter of debate whether the robust australopiths—P. robustus, P. boisei and 
P. aethiopicus (the probable ancestor of P. boisei, also found in eastern Africa)—are 
closely related to each other in a monophyletic genus or whether robust adaptive 
traits evolved independently by convergence or parallelism in eastern Africa and 
southern Africa, which would make the Paranthropus genus diphyletic**°*. Authors 
who support the diphyletic hypothesis tend to classify the robust australopiths in 
the genus Australopithecus. Until the issue is settled, here we use the genus name 
Paranthropus as a convenient taxonomic label of a grade that groups these three 
taxa, which share similar robust adaptive features that differ from the more-gracile 
genera Australopithecus and Homo. 

Craniomandibular and dental morphology. Species of Paranthropus are char- 
acterized by a suite of craniomandibular and dental morphological characters: 
reduced incisors and canines, molarized premolars, enlarged molars, thick 
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enamel, enlarged insertion areas for temporal muscles and a robust mandibular 
corpus®. This suite of robust morphological traits has been frequently interpreted 
as potential adaptations to diets including hard foods*”**. This interpretation is 
partly supported in P. robustus by dental microwear data that indicate occasional 
consumption of hard objects (see below). Other authors have suggested that these 
characters reflect the ability of Paranthropus to chew tough plant matter for pro- 
longed periods°*™. This alternative interpretation is supported by dental microwear 
and stable carbon isotopic data in P. boisei®'. This interpretation could apply to 
P.robustus as well, even though this species probably fed on different kinds of plants 
to those eaten by P. boisei (Cy herbaceous plants in P. boisei versus various C3 and 
Cy plants in PB robustus, see below). 

Potential modern analogues of herbivorous mammals that evolved from omniv- 
orous ancestors that have thick enamel and bunodont molars include the giant 
panda Ailuropoda melanoleuca and the red panda Ailurus fulgens, both of which 
have independently become highly specialized on a diet of tough bamboo leaves 
that require prolonged chewing and numerous repeated chewing cycles®"4. Both 
examples illustrate phylogenetic inertia, with evolution acting on preexistent mor- 
phologies that limit the range of possible adaptive traits in response to functional 
selective pressure and result in seemingly suboptimal morphologies for the realized 
diet. A similar argument has been made for Paranthropus*>®. Therefore, contrary 
to herbivorous ungulates that evolved long shearing crests and a more lateral mas- 
tication to reduce plant matter into small digestible fragments, the giant panda, 
the red panda and Paranthropus evolved alternative solutions combining thick 
enamel and increased dental surface with prolonged use of high masticatory forces. 

It has also been suggested that an increased dental occlusal surface in mammals 
could be an adaptive trait for feeding on small bites of small-sized food items™ 
by increasing the chance of efficiently masticating the food items with a reduced 
number of chewing cycles, and limiting the wear induced by a strong increased 
attrition (tooth-to-tooth contact) in addition to the wear induced by abrasion 
(tooth-to-food contact). Similar adaptive traits, most notably enlarged surfaces of 
third molars, evolved independently multiple times in African herbivorous suids®”. 

Compared to Paranthropus, the craniomandibular and dental morphology 
of A. sediba is more gracile and more similar to that found in older species of 
Australopithecus and younger species of Homo. It has recently been argued that the 
morphology of the cranium was unsuited for feeding on hard objects®, but that 
the morphology of the mandible was suited for this activity. 

Postcranial morphology. Very little is known about the postcranial morphology 
of P. robustus but most studies indicate the retention of traits adaptive for arboreal 
climbing””!. Hand morphology suggests an ability to perform precision gripping 
during tool-making and tool-using activities”. Numerous bone tools are found in 
Swartkrans member 3 and Drimolen, sites that have yielded numerous specimens 
of P. robustus. Microscopic and macroscopic wear analyses as well as experimental 
data suggest that these tools were used for digging into termite mounds’*-”°. The 
scarcity of Homo remains compared to those of P. robustus in sites bearing bone 
tools suggests that the latter species is most likely to be the tool maker and user’*-”°. 
A. sediba is known by two partial skeletons that provide much information on its 
postcranial morphology. The upper limbs and shoulders of this species retained 
numerous adaptive traits for climbing and suspensory behaviours”®”’. 
Microstructure and biomechanics of enamel. A study of the dental micro- 
structure of P robustus indicates that its enamel was decussated (contrary that of 
P. boisei’*), which is assumed to reflect a capacity to withstand strong and/or 
prolonged biomechanic constraints during mastication. Other mammals that feed 
on hard objects, such as hyaenids”’, and mammals that feed on tough vegetation, 
such as many ungulates, also display decussated enamel®. 

In addition, an experimental study of the behaviour of enamel under various 
biomechanic constraints suggests that thick enamel could be an adaptive trait to 
deal with foods that are either hard or tough, or laden with particulates, potentially 
including both grit and phytoliths*!. A recent study observed a low frequency of 
enamel chipping in P. robustus, concluding that it was adapted to eating tough 
vegetation rather than hard foods*”. This interpretation is at odds with the den- 
tal microwear data that indicate at least some consumption of hard objects (see 
‘Dental microwear’). Alternatively, we argue that it is equally plausible that this 
low frequency of chipping is related to the specialized decussated microstructure 
that reinforces the tooth enamel of P. robustus at a microscopic scale, making its 
teeth less prone to chipping and therefore more durable when consuming hard 
or tough foods. 

Enamel biogeochemistry. Numerous data are available regarding the stable carbon 
and oxygen isotopic compositions (expressed as §'°C and 8180, respectively) and 
the major elements (Sr, Ba and Ca) of the enamel of P. robustus, of other hominins 
and of other contemporaneous animals. The stable carbon isotope ratios enable 
the quantification of the proportions of food items deriving, directly or indirectly 
(through one or several trophic levels), from C3 plants (mostly woody vegetation, 
but also sedges in humid environments) and C;, plants (mostly grass and sedges, but 
also shrubs of the Amaranthaceae and Chenopodiaceae family). The 6!°C values of 
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the enamel of P robustus indicate a diet that was dominated by foods derived from 
C; plants, with a substantial proportion (about 35-40%) of foods derived from 
Cy, plants (Fig. 3f and Extended Data Table 2). Much effort has been dedicated to 
identifying the food items that compose this substantial Cy component in the diet 
of P. robustus**-*°, The most probable C4-plant-derived food items include grass 
leaves and roots, insects such as grasshoppers and termites, and small vertebrates 
(birds, lizards, rodents and small ungulates) that consume either C, plants or inver- 
tebrates that eat C, plants. Sedges are also considered to be a likely source of Cy 
resources, notably through the consumption of their underground storage organs 
(USOs)*”*8. A previous study**, however, has argued that C, sedges were rather 
scattered in South African riverine settings and that not many of them produced 
large palatable underground storage organs. This study therefore suggested that Cy 
sedges represented only a minor food resource for hominins. However, they only 
studied the sedges from four riverine sites located in the Kruger National Park and 
it remains unknown whether their conclusions are valid for the whole Limpopo 
catchment and larger scales. 

More important in our opinion is the fact that the diet of P robustus was 
dominated by C3-plant-derived food resources, which could include any parts 
(leaves, stems, fruits, nuts or underground storage organs) of C3 plants (mostly 
trees, shrubs and bushes in wooded environments but also abundant sedges in 
humid environments). There is little data on the enamel §!3C values of A. sediba 
(n=2) but both specimens display low values (—12.1%o and —12.2%b) that indicate 
a diet dominated by C;-plant-derived food resources”. 

Stable oxygen ratios of enamel are related to multiple factors, including behav- 
iour, ecology, diet, physiology and climate®®. A previous study” classified mammals 
into evaporation sensitive and evaporation insensitive taxa. The 6'°O values in 
evaporation sensitive taxa increase with aridity. Evaporation sensitive taxa do not 
drink much and get most of the water they require from the plants they consume. 
Conversely, &'8O values of evaporation insensitive taxa track the §!8O values of 
surface water that they drink abundantly and frequently””*'. Although hominins 
were not included in this classification scheme, they are probably water-dependent, 
as all large primates need to drink a lot of water every day. Hominins generally 
display relatively low 5'8O values on average compared to the rest of the fauna”, 
suggesting a high water dependence or consumption of plants containing little 
evaporated water. Those data support the preference of P robustus for wooded 
and/or humid environments. 

Intra-tooth variations of §'3C and 6'80 also provide information about intra- 
annual and inter-annual variation in ecology. Intra-tooth 6°C and 5!80 profiles 
were measured using laser ablation on four teeth of P. robustus from Swartkrans 
member 1°. The mean values of the °C profiles are similar to those of other 
P. robustus individuals sampled previously, and they also reveal substantial intra- 
tooth variation with ranges varying from 2%o to 5%o over periods represent- 
ing approximately one or two years (inferred from the number of perikymata). 
Significant positive correlations between the 5'°C and &'8O in three out of four 
specimens of P. robustus also indicate that they consumed relatively more C,-plant- 
derived foods during the dry season than during the wet season®’. This could 
suggest an opportunistic sampling of the environment, with a relative consumption 
of foods derived from C3 or Cy plants that depended in part on seasonal or inter- 
annual climatic differences. These data are also congruent with the identification 
of P. robustus as an evaporation insensitive taxon and are similar to the pattern 
observed in some extant evaporation insensitive herbivorous ungulates (for 
example, strong positive correlations of 5!°C and 5180 profiles in an extant 
common hippopotamus Hippopotamus amphibius”*). 

Investigations of trace elements preserved in the enamel of P. robustus have 
also revealed interesting patterns”*4. P robustus is characterized by relatively high 
Sr/Ca ratios, lower than in grazing ungulates and slightly higher than in carnivores, 
browsing ungulates and omnivorous cercopithecid monkeys. Australopithecus 
africanus is characterized by even higher Sr/Ca ratios. Both hominins, but espe- 
cially A. africanus, display low Ba/Ca ratios. Altogether, these data indicate that 
the proportion of animal matter in their diets is likely to have been low. The only 
extant animals that combine a high Sr/Ca ratio and a low Ba/Ca ratio are the 
mole rat (Cryptomys hottentotus), and to a lesser extent, the common warthog 
(Phacochoerus africanus). Both species consume large amounts of grass roots. This 
could indicate that grass root consumption was an important aspect of the diet of 
P. robustus, explaining part of the Cy component of the diet. However, the Sr/Ca 
and the Ba/Ca ratios of A. africanus and P. robustus are not as extreme as those 
found in the mole rat and the common warthog, suggesting that the consumption 
of grass roots was probably not as important as in those two species. As previously 
highlighted‘, one must, however, stress that the use of major element concen- 
trations in enamel as an indication of the diet of extant and extinct animals still 
require further study. One previous study” observed Sr/Ca and Ba/Ca ratios for 
P. robustus that are intermediate between those of A. africanus and early Homo, 
and similar to those of browsers: these authors argue the diet of P robustus was 
dominated by woody plants. 
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Dental microwear. Investigations of dental microwear have also provided useful 
information for reconstructing the diet of P. robustus®!°>-°’. Dental microwear 
mostly results from a combination of abrasion (food-to-tooth contact) and attrition 
(tooth-to-tooth contact), both of which result in various microscopic scars left 
at the surface of the enamel facets. Among hominins, specimens of P. robustus 
are characterized by unique microwear textures that display strongly variable 
complexities, ranging from low to very high values, and low anisotropies. This 
pattern, originally observed on a few specimens, was later confirmed by a large- 
scale study of dental microwear in P. robustus, including numerous specimens 
from various sites”®. Similarities in dental microwear textures were noted between 
P. robustus and specimens of several extant primates that are known to consume 
hard objects such as nuts and seeds (grey-cheeked mangabeys, Lophocebus albigena 
or brown capuchins, Sapajus apella), indicating at least some consumption of hard 
objects by P robustus. Dental microwear data available for A. sediba—although 
they are based on only two specimens—also show high complexities, suggesting a 
potential consumption of hard objects”°. 
Stable carbon isotopes of contemporary mammals. Compiled data of 5°C values 
for herbivorous mammals that were contemporary and sympatric with P. robustus 
indicate habitats encompassing a mix of C3 and Cy, plants, suggesting a mix of 
woodlands and grasslands in all sites that were sampled (data for Swartkrans mem- 
bers 1, 2, and 3°549910. Cooper’s Cave D!°!; Gondolin!™; Fig. 3f and Extended 
Data Table 2). The enamel &3C values of herbivorous mammals contemporane- 
ous with P. robustus from all sites for which sufficient data are available display a 
bimodal distribution (Extended Data Table 2): the main mode indicates that the 
habitat was dominated by herbivorous mammals consuming mostly C, plants, and 
the secondary mode indicates that a substantial portion of the remaining herbiv- 
orous mammals consumed C; plants. Mammals displaying a strong C, signal in 
their enamel probably consumed herbaceous plants, including mostly dry-adapted 
C4 grasses in open habitats and possibly C, sedges in humid habitats. They are 
usually classified as grazers. Conversely, mammals displaying a strong C; signal 
in their enamel probably fed mainly on woody plants (trees, shrubs or bushes) in 
woodlands, and possibly on C3 sedges in humid habitats. They are usually classified 
as browsers. Complicating factors of palaeodietary reconstructions include the 
potential consumption of CAM plants (mostly succulents in arid habitats, and epi- 
phytic plants in closed forests) and C, dicot woody vegetation (for example, shrubs 
of the family Amaranthaceae or Chenopodiaceae). However, the classification of 
mammals using the dichotomy between C3 browsers and C, grazers inferred from 
83C values is generally confirmed by ecomorphological and dental wear studies!" 
83°C data of P robustus are intermediate between the Cy pole and the C; pole, but 
closer to the latter, indicating a preference of this hominin for the C; wooded and/ 
or humid component of its habitat. 

The enamel §!°C data of the herbivorous mammals found at the Malapa site with 
A. sediba indicate an environment dominated by Cy, plants, presumably grasses, 
and a few species consuming C; plants, presumably woody plants”*. The pattern 
is therefore similar to that observed in sites containing P. robustus, indicating a 
preference of A. sediba for the C; component—presumably woodland—of the 
otherwise C4-grass-dominated landscapes. 
Ecomorphology and community structures. A previous analysis!™ quantified 
trophic and locomotor adaptive traits of mammals to compare the ecological diver- 
sity of modern and past faunal communities, and found evidence that P robustus 
inhabited mosaic environments that included both woodlands and grasslands, 
always close to a water source. This study also detected a pattern of more-open 
habitats throughout the Swartkrans sequence. This observation supports the infor- 
mation provided by §°C values of the herbivorous mammals that—based on our 
data compilation—generally indicate more C, plants in the landscape, presumably 
grasses adapted to open and dry environments (Fig. 3f). Another previous study! 
conducted a correspondence analysis using the relative abundances of the different 
groups of mammals classified as woodland-adapted and grassland-adapted, on 
the basis of stable carbon isotopes and uniformitarian comparisons to extant 
relatives. This study observed that the relative abundance of P. robustus follows the 
relative abundances of woodland-adapted species and is negatively correlated to the 
relative abundances of grassland-adapted species. An ecomorphological analysis 
(L. C. Bishop et al., personal communication) of bovid postcrania from Sterkfontein 
member 5B ‘Oldowan infil? concluded that most bovid species were adapted to 
open grasslands, and reconstructed the environment as dominated by grasslands 
but with a nearby more-wooded component. Finally, another previous analysis! 
has suggested that the absence of P. robustus in Sterkfontein member 5 west could 
be linked to a drier local habitat without water-dependent species. Overall, these 
studies indicate that grasslands were important and probably predominant in the 
environments occupied by P. robustus, but always with a more-wooded and humid 
component nearby. The ecological data gained from multiple lines of evidence 
suggest that P robustus preferred this woodland and/or humid component. 
Combining the multiple lines of evidence. Based on the evidence discussed 
above, we interpret P. robustus as a species that was tolerant of environmental 


variability (eurytopic), especially in terms of dietary resources, but with a long-lasting 
preference for the wooded or humid components (characterized by C3 plants) 
of environments that were otherwise dominated by C, dry-adapted plants. Such 
a selective feeding behaviour—with scarce components of the vegetation that 
are over-represented in the diet of an animal—is frequently observed in extant 
mammals. For example, extant geladas (Theropithecus gelada) from the Guassa 
Plateau in the highlands of Ethiopia rely extensively on forbs (about 38% of annual 
diet, and up to 61% of monthly diet) although their preferred forbs represent only 
8% of ground cover’. A variable C3-plant-dominated diet, including both hard 
and tough food items, and displaying strong seasonal or inter-annual variations, 
is supported by the stable carbon and oxygen isotopes, the dental microwear 
textures and the overall morphological adaptive traits displayed by P. robustus. The 
robust craniomandibular and dental traits appear as a reasonable compromise to 
efficiently process an extremely diversified diet including numerous tough parts 
of plants along with some hard foods, and probably a large number of small food 
items that had to be eaten in small bites, with some exogenous grit particulates 
adhering (for example, termites or grass roots). 

The preference of P. robustus for habitats dominated by C; plants, either in 

woodlands or in humid environments, is well corroborated by the data gained 
from the study of other animals, indicating large quantities of Cy vegetation but 
always with a more wooded component and the nearby presence of a water source. 
Frequent exploitation of the C4-plant food resources in the more open component 
of the landscapes is demonstrated by stable carbon isotopes that indicate a sub- 
stantial, but not dominant, C, component in the diet. Prehensile hands and the 
possible use of bone tools would have enabled P. robustus to access a great variety 
of foods in both wooded and open habitats. P. robustus may be best characterized 
as an ecotonic species, exploiting intermediate habitats in which the edge effect is 
maximized, enabling it to forage on a maximal variety of foods in both wooded 
and open habitats within a limited area while retaining access to secure shelters in 
woodlands. Such an ecology is displayed by the forest hog (Hylochoerus meinertz- 
hageni) that strongly depends on wooded humid forests for shelter, food and water, 
but which also frequently exploits open grasslands for additional plant resources’. 
Data are scarcer for A. sediba but the multiple lines of evidence discussed above 
suggest an even stronger preference for C3-plant-dominated wooded habitats, 
when compared to P. robustus. It is worth noting that a similar preference for 
the C3-plant component, presumably woodlands, in overall C4-grass-dominated 
landscapes is also documented in several other hominin species from eastern 
Africal0? 10, 
Extinction of P. robustus. It appears likely that the last documented occurrence of 
P.robustus is dated about 0.9 Ma, or even later (see Extended Data Table 1). Indeed, 
multiple lines of dating evidence point to a young age of Swartkrans member 3, 
probably around 0.9 Ma but possibly as young as 0.6 Ma. It is also worth noting the 
frequency of artificial range truncation of extinct taxa, often called the Signor and 
Lipps effect'’': owing to the imperfect nature of the fossil record, the extinction 
date of a particular taxon is most probably more recent than the last fossil occur- 
rence of this taxon. Bearing in mind that this caution was formulated on the basis 
of the near-continuous and precisely dated marine palaeontological record, the 
Signor and Lipps effect is even more pertinent when considering extinction dates 
based on the spatio-temporally biased continental fossil record of terrestrial faunas 
(see details in ref. |). This dating uncertainty most probably applies to P. robustus 
as the known geographic distribution of this species is extremely small. Most of 
the sites that contain P. robustus are located within a circle of about 3-km radius, 
and only Gondolin is located a bit further north, around 25 km from Swartkrans 
and Sterkfontein. A circle of 12.5-km radius encompassing all the known occur- 
rences of P. robustus would therefore represent a distribution area of about 500 km’. 
Large mammals tend to have much larger geographic distributions than 500 km/. 
For comparison, the critically endangered eastern gorilla (Gorilla beringei)—the 
extant species of African great ape that has the smallest geographic distribution— 
is found over an area of about 70,000 km?. Other great apes have much larger 
distributions (Gorilla gorilla, over 700,000 km?; Pan paniscus: about 156,000 km?; 
Pan troglodytes, over 2,600,000 km/?; all data are from the International Union for 
Conservation of Nature website, http://www.iucnredlist.org/). It is therefore most 
likely that the real distribution area of P. robustus was much larger than the area that 
is currently sampled by the available fossil record, making the true last occurrence 
of the species unlikely to be sampled, and the real extinction date of the species 
probably younger than 0.9 Ma. 

The fossil record of South Africa indicates that numerous species became 
extinct during the Pleistocene. However, the paucity of the record after 1.4 Ma 
seriously hinders our understanding of the pattern and timing of the extinctions. 
Middle Pleistocene faunas such as those of Florisbad (about 0.25 Ma) are almost 
entirely composed of species that are similar to the extant species whereas older 
faunas (for example, Cornelia at around 1 Ma or Elandsfontein at 1-0.6 Ma) 
contain a substantial proportion of extinct species (see ref. !!). Neither a precise 
date nor a time interval (1-0.5 Ma versus later during the Pleistocene) can be 
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constrained for the extinction of those species based on their currently available 
fossil records. 

Our marine record and previously published terrestrial records are indic- 
ative of a clear trend through time towards more-open and drier landscapes 
in the Limpopo catchment. Given that P. robustus preferentially thrived in the 
non-dominant component of its environment (either woodland or humid grass- 
land) characterized by C; plants, and that the diet of this species was dominated 
by C3-plant-derived food items, including occasional hard objects, we assume that 
the regional trend towards a more arid hydroclimate after 1 Ma and the marked 
precessional variability affected the abundance of this species and its resilience to 
environmental changes. As the regional hydroclimate became drier, the wooded 
and humid environments favoured by P. robustus probably became progressively 
scarcer, strongly affecting the fitness and survival of populations of mammals that 
depended on these habitats for food, water and shelter. We propose a speculative, 
but plausible, scenario inspired by previous theoretical work’’. According to this 
scenario, the geographic ranges of taxa adapted to woodlands and humid habi- 
tats, including P. robustus, contracted and expanded according to the precessional 
~21-kyr dry and wet cycles. During multi-millennial dry periods, the range of 
populations of these taxa contracted and often became fragmented. The resulting 
isolated populations were especially prone to local extinction through increased 
competition and predation induced by the lack of sufficient and suitable food, water 
and shelters. During multi-millennial wet periods, preferred woodland and humid 
habitats would have expanded again and the surviving populations would have 
expanded into their previously occupied range, replacing locally extinct popula- 
tions. The long-term trend to aridity, inferred from our marine record, implies that 
dry periods became more and more marked through time, increasing the likeli- 
hood for local extinction of numerous local populations, until the extinction of the 
last remaining local population and therefore the extinction of the species. Thus, 
both long-term state and the extreme precessional changes in hydroclimate could 
have affected the evolution of P. robustus. A recent synthesis of factors involved 
in the extinction of large mammals—spanning three continents and the whole 
Cenozoic period—concluded that abiotic changes, such as climatic changes, were 
key players in the extinctions of species”®. 

What of Homo? Several authors (reviewed in ref. '™*) have suggested that strong 
morphological, behavioural and ecological differences between P. robustus and 
the contemporary Homo gave an evolutionary advantage to the latter over the 
former. Given that Homo did not go extinct and remained extremely widespread 
in Africa and beyond, some authors have related these differences to the extinction 
of P. robustus. Notably different relative abundances of these two taxa do suggest 
they were occupying separate ecological niches in the sites where they co-occur 
(Swartkrans 96% P. robustus, 4% Homo; Drimolen 84% P. robustus 16% Homo, 
according to previous work!!>). However, these scenarios are speculative and 
clearly beyond the scope of our paper, which is focused on P. robustus. Regardless 
of any potential evolutionary advantage of Homo over P. robustus, whether or not 
Homo went extinct locally in the Limpopo catchment during the aridification 
period is meaningless to the subsequent evolutionary history of the genus Homo. 
Remains of Homo are known in other parts of South Africa at around 1 Ma (for 
example, Elandsfontein!!° and Cornelia-Uitzoek!”), as well as in other parts of 
Africa!!®°, which would have made any local extinction counterbalanced by 
subsequent dispersals from other regions. The survival of Homo could plausibly 
be explained solely through plain contingency, especially as our synthesis as well as 
recent literature! indicates a eurytopic ecology for both Homo and Paranthropus. 
Code availability. The iLOVECLIM source code is based on the LOVECLIM 
model version 1.2, for which code is accessible at http://www.elic.ucl.ac.be/modx/ 
elic/index.php?id=289. The developments of the iLOVECLIM source code are 
hosted at http://forge.ipsl.jussieu.fr/ludus, but are not publicly available owing to 
copyright restrictions. Access can be granted on demand by request to D.M.R. 
(didier.roche@lsce.ipsl.fr) to those who conduct research in collaboration with 
the iLOVECLIM users group. 

Data availability. The datasets generated during and/or analysed during the 
current study are available from the corresponding author on reasonable request. 
Source Data for Fig. 2 is provided with the paper. 
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Extended Data Fig. 1 | In(Fe/Ca) as a proxy for Limpopo runoff. 
Calcium and iron both have complex and multiple origins in marine 
sediments. Iron can be related to redox variations, detrital and fluvial 
input, among others, and calcium can be related to the biogenic fraction 
(foraminifera or nannofossils) and detrital input. To properly interpret the 
In(Fe/Ca) ratio at our study location, we applied principal components 
analysis'*°. a, PC1 describes 66% of the total variance for the entire site 
MD96-2048. The negative loadings for PC] are calcium and strontium, 
and all other elements (aluminium, silicon, potassium, titanium, iron 
and zirconium) have positive loadings. Calcium and strontium are 
associated with biogenic carbonate and are mainly related to presence of 
foraminifera. Element matrix correlation shows a strong positive linear 
correlation (R > 0.70) between iron and typically detrital elements, such 
as aluminium, silicon, titanium and potassium. Calcium shows negative 
correlation with iron (R= —0.5). b, In(Fe/Ca) shows a strong correlation 
with PC1 (R=0.94) and a strong relationship with Limpopo runoff 
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proxies (Extended Data Fig. 3). Iron and titanium elements are related to 
terrigenous and siliciclastic components (heavy minerals and oxides) and 
the variation in carbonate content (calcium) is mainly due to dilution by 
terrigenous sediment. In(Fe/Ca) is therefore a proxy of Limpopo runoff, 
consistent with previous studies in riverine basins throughout the African 
continent'®!*!"!*4, To confirm a weak influence of sea-level changes on 
the Fe/Ca record, we compared our In(Fe/Ca) record with a previous 
reconstruction of the deep-water 5'°O component for relative sea level!”°, 
(b, bottom). Both records are plotted against the LR04 chronology. Visual 
inspection and statistical testing do not support a dominant effect of sea- 
level changes on the In(Fe/Ca) record (R= 0.05). PC3, which describes 
11% of the total variance for the entire site MD96-2048, is closely related 
to sea-level changes. The negative loadings for PC3 are mainly strontium 
and, to a lesser degree, potassium and titanium, and the main positive 
loadings are zirconium and, to a lesser degree, silicon. 
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Extended Data Fig. 2 | Control on the 5D composition of precipitation the Limpopo catchment (about —27.5° S to —22° S and 30° E to 36° E), for 
in the Limpopo catchment. a, b, Seasonal 8D composition of precipitation _ the past 150 kyr (Methods)°!. Black curves show the results after filtering 
(a) and amount of precipitation at Pretoria station!*® (b), in comparison to _ with a low-pass filter. The &D composition of precipitation and 


the results of the iLOVECLIM model at the corresponding latitude and precipitation amount in the Limpopo catchment are negatively correlated 
longitude**?. All data are centred on their annual average. Depleted 5D (R= —0.63, P<0.001) for the past 150 kyr. Maxima of precipitation are 
values are indicative of increasing amounts of rainfall’?”. c, Results of the phased with maxima in austral summer insolation at 30° S and lead to 
transient simulation with the isotope-enabled numerical climate model more-depleted 6Dprecipitation (amount effect). 


iLOVECLIM for the 5D composition of precipitation and precipitation in 
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Extended Data Fig. 3 | Relationship between Limpopo runoff, local in our study indicates that enriched §C,,, values are associated with 
Southern Hemisphere insolation and the C3; n-alkane 5'°C record for more-humid conditions. Because C4 plants in the Limpopo catchment 
the past 800 kyr. a, Comparison between the In(Fe/Ca) XRF signal and are dominant in the interior (Fig. 1), we propose that more-enriched 
austral summer local insolation at 30° $*!. b, Comparison between the 5'3Cwax Values indicate a higher relative contribution from sources 
In(Fe/Ca) XRF signal and the brGDGT concentration in the sediment”. located farther upstream (more Cy, plants) during times of high runoff, 
brGDGTs are commonly found in soil and can be attributed to Limpopo compared to only downstream sources (more C; plants) during times of 
River runoff'®. c, Comparison between the In(Fe/Ca) XRF signal and low discharge. In addition, humid conditions would have favoured the 
the C3, n-alkane 61°C record!®. An increased amount of Limpopo River extension of sedge-rich vegetation (Cyperaceae, of which 20-60% are C4 
discharge is associated with more C, plant input and an increase in austral _ plants in this region’”*) in riverine swamps and floodplains along the river 
summer insolation at 30° S. d, Comparison between inverted In(Fe/Ca) course, explaining the detected increase in Cyperaceae pollen at times of 
XRF signal and the accumulation rate (AR) of CaCO; as a measure of increased fluvial discharge (Fig. 2). Studies of sediments from the adjacent 


biogenic carbonate. The In(Fe/Ca) XRF record is not primarily controlled Zambezi catchment similarly suggest the extension of swampy sedge-rich 
by dilution due to biological productivity (R=0.1). A previous study ofthe — vegetation—including Cy-Cyperaceae—when river discharge was high, 
past 0.8 Myr of core MD96-2048 interpreted shifts towards more-depleted —_ and infer that more Cy plant waxes are exported to the ocean when the 
83Cwax as potentially reflecting more-humid conditions!®. However, the flooding of floodplains occurs during rainfall maxima!*”°. 
anti-correlation between 6'?Cwax and &Dwax values (Extended Data Fig. 4) 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


6 C C,, n-alkane (%o) 


A Pe 

oO 

-140- . 4 

S oO 

oO 

al © Y= -6,27x - 312 10 Ss 

= R? = 0,65 - 2 

2 -1504 < ae 
= -155- = 
e -20 3 

J -160-| 3 2 

2 165 —"\ -_ 

ree ete [OR F 

-170-, T T T ~ : 

-27 -26 -25 -24 -23 3 

ja) 


Extended Data Fig. 4 | Relation between the 88C C3; n-alkanes record 
and the 6D C3, n-alkanes record. a, Correlation between the record of 
83C C3, n-alkanes and the record of D C3, n-alkanes, with or without 
vegetation and ice-volume correction (vc-ivf) over the past 2.14 Myr 
(n=19 samples). An anti-correlation exists between the 5C and the 6D 
signals of the C3; n-alkanes. The C3; n-alkane is used because it is the 
most abundant homologue in the samples. b, Raw 8! Cwax, SDwax data 
and Dyax adjusted for ice-volume and vegetation changes from core 
MD96-2048. Mean analytical uncertainties are indicated. Top, 6°Cyax 

of the C3; homologue (data from a previous study’® in light green, and 
data from this study in dark green). Middle, 5Dwax of the C3; homologue. 
Bottom, 6Dwax of the C3; homologue adjusted for ice-volume changes 
(ivf) using a seawater 6'8O curve’”> and converting to 6D assuming an 
increase of 7.2%o at the Last Glacial Maximum. We use 7.2%o because 
measurements of sediment pore water 6'8O and 8D suggest that the glacial 
ocean §D increase has a mean value of 7.2%o'*". We also adjusted the 
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SDwax record for vegetation changes (vc) using published fractionation 
factors (—123%o +31%o for C3 trees, — 139% + 27%o for Cy grasses!?') 
and the §°C,,.x signal following a previously published procedure!*”. 
End-member §!?Cwax values used for C3 and Cy vegetation were —36%o 
and —21.5%o, respectively'*’. The error ranges for the vegetation 
fractionation factors are very large’*!. They derive from the compilation 
of a global dataset from individual plants, which is not comparable to an 
ecosystem fractionation in a specific catchment (such in the Limpopo) 
that will fractionate with a much smaller uncertainty. However, as we do 
not know the exact fractionation factor in the Limpopo catchment and 
regard the uncertainties from the global compilation as unrealistic for a 
specific ecosystem we refrained from propagating this uncertainty into 
the vegetation corrections. The vegetation and ice-volume-adjusted &Dwax 
record is very similar to the unadjusted record, highlighting the fact that 
the adjustments have a minor effect. 
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Extended Data Fig. 5 | Statistical analyses for the In(Fe/Ca) XRF 
record and PC1 SST record. a, Spectral power for In(Fe/Ca) by wavelet 
analysis realized with a previously published MatLab package’**. The 
thick contour designates the 5% significance level against red noise. 
Dashed black lines indicate the variability at the precession, obliquity and 
eccentricity periods. b, Spectral analysis of In(Fe/Ca) with REDFIT!”°. 
The red line shows the false-alarm level at the 95% confidence interval. 
Spectral peaks exceeding the false-alarm level can be considered 
significant!*°. c, Blackman-Tukey cross correlation between In(Fe/Ca) 
XRF and eccentricity-tilt-precession (ETP) realized with the Analyseries 
software®’ for the past 2.14 Myr. ETP is constructed by normalizing and 
stacking eccentricity, tilt (obliquity) and negative precession to evaluate 
coherence and phase (timing) relative to orbital extremes!**. The red curve 
shows the spectral power for In(Fe/Ca) record. The black curve shows 

the spectral power for ETP. The coherency, which varies between 0 and 1, 
is represented by the grey curve and gives the interval within which the 
spectrum is significant. In our case, the non-zero coherency is higher than 
0.55 and is significant at the 95% confidence interval (grey line). There 
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are significant spectral peaks for eccentricity and precession but not for 
obliquity. The In(Fe/Ca) XRF record and ETP are in phase at the 400-kyr 
period, the eccentricity leads by 16 kyr the In(Fe/Ca) record at the 100-kyr 
period and the In(Fe/Ca) record is in anti-phase with negative precession 
(in-phase with positive precession) at the 19- and 23-kyr periods. The 
three statistical analyses are consistent and indicate significant variability 
at the 400-, 100-, 23- and 19-kyr periods and insignificant variability at 
the 41-kyr period. d, Comparison between the precessional component 
of the In(Fe/Ca) record (Gaussian filter frequency 1/23,000; bandwidth: 

5 x 10°) obtained with the Analyseries software*’ and the precession 
index. Maxima of the In(Fe/Ca) precession component are in phase with 
precession index maxima. The precession cycles in the In(Fe/Ca) record 
appear particularly strong between about 0.9 and 0.6 Ma. e-g, The same 
statistical analyses as in a—c, respectively, but for the PC1 SST record. In 
e, dashed white lines indicate the variability at the precession, obliquity 
and eccentricity periods. The three statistical analyses indicate significant 
variability at the 100- and 41-kyr periods but not significant power for the 
400-kyr and 23-kyr (precession) periods. 
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Extended Data Fig. 6 | Reconstruction using SST proxies for core reconstruction using the modern analogue technique. Error bars represent 
MD96-2048 for the past 2.14 Myr. a, Reconstruction of SST using two the error on the calibrations*® (Extended Data Fig. 7). b, Empirical 
different methods: Mg/Ca reconstruction based on previous'* and new orthogonal function analysis*’ of the two SST records for the past 

data (Mg/Ca ratios were converted into temperature values by applying 2.14 Myr. PCI contains 74% of the total variance for the past 2.14 Myr. 

a previously established equation*’) and foraminifera transfer function Correlation between SST proxies and PC1 for the past 2.14 Myr is R=0.71. 
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Extended Data Fig. 7 | Foraminifera transfer function used for core to a precision of 0.8 °C for the annual SST reconstructions. Modern 
MD96-2048. a, Location of the modern database, composed of 367 core hydrological parameters were obtained from the WOA (1998) database 
tops from the south Indian Ocean* with present-day SST from the World using a previously developed tool (http://www.geo.uni-bremen.de/ 
Ocean Atlas (WOA) 2009”. b, Test for the modern database** yielding geomod/Sonst/Staff/csn/woasample.html). 
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Extended Data Table 1 | Fossil finds, their location and associated ages 


Site/area Taxon Stratigraphic unit Methods References Estimated age of fossils 
Malapa Austalopiiiecds Biochronology Dirks et al. (137) From 1.95 Ma to 1.78 Ma 
U-Pb dating of flowstones: basal flowstone 1 at 2.026 + 0.021 Ma; 7 f 
capping flowtone 2 at 2.048 + 0.140 Ma Pickering et al. (138) From 2.188 Ma to 1.908 Ma 
Paleomagnetism : reversed polarity of flowstone 2, normal polarity of Pickering et al. (138) Older than 1.95 Ma 
fossiliferous sediments 
Synthesis of flowstone dating and paleomagnetism, 1.977 + 0.002 Ma Pickering et al. (138) From 1.979 to 1.975 Ma 
Cooper's D Paranthropus Biochronology Berger et al. (139) From 1.9 Ma to 1.6 Ma 
robustus 
U-Pb dating of flowstones: basal flowstone CDD1 at 1.526 + 0.088 From 1.615 Ma to ca. 1.4 Ma 
Ma; younger flowstone CDD3, intercalated within fossiliferous de Ruiter et al. (140) (upper facies A and C) and 
sequence, at ca. 1.4 Ma (unprecise dating from 1.617 Ma to 1.413 . younger than 1.4 Ma (lower 
ee _ Ma) 7 _ facies A and C) 
Drimolen —_P. robustus Main Quarry Site Biochronology Keyser et al. (141) From 2.0 Ma to 1.5 Ma 
Biochronology Adams et al. (142) From 2.3 Ma to 1.6 Ma 
iy Paleomagnetism of flowstone above Mb. 3, reverse polarity, older than 
Kromdraai B P. robustus normal Olduvai event (between 1.95 Ma and 1.78 Ma) Thackeray et al. (143) | Older than 1. 95 Ma 
Alternative interpretation of paleomagnetic data from Thackeray et al. Herries et al. (144); 
(2002) Herries & Adams (145) From 1.78 Mato 1.65 Ma 
Biochronology (including hominins) and paleomagnetism Braga et al. (146) Older than 2.18 Ma 
Gondolin _P. robustus GD2* Biochronology and paleomagnetism (Olduvai normal polarity event of Herries et al. (147) Slightly older than 1.78 Ma 
fossiliferous sediments) 
iP rebeshie cpt Biochronology and paleomagnetism (end of Olduvai normal polarity ‘Adams et al. (148) Slightly younger than 1.78 Ma 
event in basal flowstone) 
Sterkfontein P. robustus Mb. 5B "Oldowan infill" Biochronology and lithic typology Kuman & Clarke (106) From 2.0 Ma to 1.7 Ma 
ESR on bovid teeth (seven dates): 1.328 + 0.087 Ma; 1.315+0.295  Curnoe (149) 
Mb. 5B "Oldowan infill". Ma; 1.185 + 0.96 Ma; 1.265 + 0.125 Ma; 1.620 + 0.626 Ma; 0.965 + reinterpreted in Herries From 1.40 Ma to 1.24 Ma 
0.147 Ma; 1.24 + 0.28 Ma; weighed mean = 1.32 + 0.08 Ma & Shaw (150) 
Mb. 5B "Oldowan infill". Synthesis of ESR, U-Pb dating, paleomagnetism Herries & Shaws (150) From ca. 1.4 Ma to 1.2 Ma 
ESR on teeth from 0.965 + 0.147 Ma to 1.328 + 0.087 Ma; weighted Curnoe (149) nom i> Mato eae 
- vei . 5 7 = : : : (maximal); 1.112 Ma to 0.818 Ma 
Mb. 5B "Oldowan infill". mean LU-ESR (excluding one tooth with large internal errors) = 1.223 + reinterpreted in Herries Sah 
(minimal); from 1.378 Ma to 1.223 
0.155 Ma et al. (144) ‘ 
(weighed mean) 
‘ «1 26,1 (0, . 
Mb. 5B "Oldowan infil” eae burial (*AI/"Be) dating of a quartz manuport 2.18 £0.21 Granger et al. (151) From 2.39 Ma to 1.97 Ma 
Mb. 5B "Oldowan infill". Synthesis of paleomagnetism, U-Pb dating, and ESR Herries & Adams (145) From 1.8 Ma to 1.5 Ma 
Mb. 5B "Oldowaan infill". Synthesis of biochronology and ESR Herries et al. (144) From 1.38 Ma to 1.07 Ma 
Swartkrans P. robustus Mb. 1 Cosmogenic burial AI/°Be) dating on quartz 2.19 + 0.08 Ma Gibbon et al. (152) From 2.27 Ma to 2.11 Ma 
Mb. 1 Cosmogenic burial (°Al/°Be) dating on quartz 1.80 + 0.09 Ma Gibbon et al. (152) From 1.89 Ma to 1.71 Ma 
Mb. 1 Hanging RemnantSynthesis of U-Pb dating and ESR Herries & Adams (145) From 2.0 Ma to 1.8 Ma 


. 1 Hanging Remnant 


ESR 2.02 + 0.36 Ma; 2.07 + 0.37 Ma; 1.68 + 0.28 Ma; weighed mean 
= 1.96 Ma-1.70 Ma 


. 1 Hanging RemnantESR LU 1.39 + 0.18 Ma 


. 1 Hanging RemnantESR LU 1.92 + 0.34 Ma 


. 1 Hanging RemnantESR LU 1.21 + 0.22 Ma 


Curnoe et al. (153) 
reinterpreted in Herries 
& Adams (145) 


Herries et al. (144) 
Herries et al. (144) 


Herries et al. (144) 


From 1.96 Ma to 1.70 Ma 


From 1.57 Ma to 1.21 Ma 


From 2.26 Ma to 1.58 Ma 


From 1.43 Ma to 0.99 Ma 


Mb. 1 Lower Bank U-Pb dating Pickering et al. (154) From 2.3 to 1.6 Ma 
Mb. 1 U-Pb dating on tooth 1.83 + 1.38 Ma Balter et al. (155) From 3.21 to 0.45 Ma 
Vrba (156, 157); 
Mb. 1 Biochronology Churcher & Watson ca. 1.7Ma 
(158) 
7 7 . Maximal range from 2.326 Ma to 
Mb. 1 pie depri asec en ietis rie = 0.077 Ma; top owstone 1/08 Pickering et al. (154) 1.637 Ma; minimal range from 
: é ut a 2.172 Ma to 1.775 Ma 
Swartkrans P. robustus Mb. 2 U-Pb dating on tooth 1.36 + 0.29 Ma Balter et al. (155) From 1.65 Ma to 1.07 Ma 
Vrba (156, 157); 
Mb. 2 Biochronology Churcher & Watson ca. 1.5Ma 
(158) 
Mb. 2 Relative position to Mb. 1 dated by U-Pb Pickering et al. (154) Younger than ca. 1.7 Ma 
Swartkrans P. robustus Mb. 3 Synthesis of U-Pb and ESR dating Herries & Adams (145) From 1.3 Ma to 0.6 Ma 
Mb. 3 Cosmogenic burial on quartz 0.96 + 0.09 Ma Gibbon et al. (152) From 1.05 Ma to 0.87 Ma 
Cited in Herries and 
Mb. 3 ESR 0.65 + 0.15 Ma Adams (145) From 0.8 Ma to 0.5 Ma 
Cited in Herries and 
Mb. 3 ESR 1.25 + 0.09 Ma Adams (145) From 1.34 Ma to 1.16 Ma 
Mb. 3 U-Pb dating on tooth 0.83 + 0.21 Ma Balter et al. (155) From 1.04 Ma to 0.62 Ma 
Vrba (156, 157); 
Mb. 3 Biochronology Churcher & Watson ca. 1.0Ma 
(158) 
ESR LU on two bovid teeth (four dates): 0.71 + 0.90 Ma and 0.80 + 
Rees 016 Ma; 0.65 + 0.15 Ma and 0.70 + 0.11 Ma; mean =0.72+0.13Ma Blackwell (159) From0: 63'Ma tp'0:52:Ma 
Mb. 3 Synthesis of biochronology and U-Pb Herries et al. (144) From 1.04 Ma to 0.62 Ma 
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Data in this table were taken from refs10%137-16°, We consider the ages in bold as the best estimate. The different dating methods did not yield any agreement regarding the age of Kromdraai B and 
Sterkfontein member 5 ‘Oldowan Infill’. Therefore, no estimate is highlighted in bold and the stratigraphic ranges are not shown in Fig. 3. We favour U-Pb dates and cosmogenic burial of quartz dates 
rather than biochronology or electron spin resonance, although dates produced using the latter two methods are generally not inconsistent with those produced using the other methods!“5. *P. robustus 
fossils were not found at Gondolin GD 1 and GD 2 but nearby ex situ. Given the close age between Gondolin GD 1 and GD 2 deposits and the limited extent of outcrops, it has been suggested!“ that 
the ex situ hominin specimens from Gondolin are dated to around 1.78 Ma. 
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Extended Data Table 2 | 613C enamel of hominin and contemporaneous herbivores and associated statistical parameters for different sites 
in the Limpopo catchment 


3¥C (%o) 
Group nN mean median mode(s) (2 %o interval) SD min max range References 
Cooper' D herbivores 45 -5,0 -4,7 int. 1:-12to-10 %o;int.2:-2toO%o 3,9 -11,5 2,5 14,0 Steininger (101) 
Gondolin GD2 herbivores 21 -2,9 1,0 int. 1: -10 to -8 %o; int.2:0to2%o 5,3 -11,1 3,5 14,6 Adams (102) 
Lee-Thorp et al. (99); 
Sponheimer et al. (83); 
Swartkrans Mb. Lherbivores 56 -4,8 -3,8 — int.1:-10to-8%o;int.2:-4toO0%o 4,2 -12,4 2,2 14,6 Steininger (160) 
Lee-Thorp et al. (99); 
Swartkrans Mb. 1 Sponheimer et al. (84, 
Paranthropus 18 -7,2 -6,9 int.: -8 to -6 %o 1,2 -9,6 -4,9 4,7 83) 
Swartkrans Mb. 1 Homo 3 -82 = -8,2 int.: -10 to -8 %o 09 -9,2 -7,1 2,1 Lee-Thorp et al. (100) 
int. 1: -12 to -10 %o ; int. 2: -4 to -2 Lee-Thorp et al. (99, 
Swartkrans Mb. 2 herbivores 53 -4,8 = -3,8 %o 44 -12,9 2,2 15,1 100); Steininger (160) 
Swartkrans Mb. 2 
Paranthropus 2 -91 -9,1 -10,0 -8,1 1,9 Lee-Thorp et al. (99) 
Swartkrans Mb. 3herbivores 12 -3,8 — -2,2 int.: -4 to 0 %o 3,3 -11,6 -0,5 11,1 Steininger (160) 
Swartkrans Mb. 3 
Paranthropus 1 -7,9 -7,9 -7,9 -7,9 Lee-Thorp et al. (99) 
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Globally rising soil heterotrophic respiration over 


recent decades 


Ben Bond-Lamberty"*, Vanessa L. Bailey’, Min Chen!, Christopher M. Gough? & Rodrigo Vargas* 


Global soils store at least twice as much carbon as Earth’s 
atmosphere!”. The global soil-to-atmosphere (or total soil 
respiration, Rs) carbon dioxide (CO2) flux is increasing**, but 
the degree to which climate change will stimulate carbon losses 
from soils as a result of heterotrophic respiration (Ry) remains 
highly uncertain® ®. Here we use an updated global soil respiration 
database’ to show that the observed soil surface Ry:Rs ratio 
increased significantly, from 0.54 to 0.63, between 1990 and 2014 
(P= 0.009). Three additional lines of evidence provide support 
for this finding. By analysing two separate global gross primary 
production datasets!”!", we find that the ratios of both Ry and Rs 
to gross primary production have increased over time. Similarly, 
significant increases in Ry are observed against the longest 
available solar-induced chlorophyll fluorescence global dataset, 
as well as gross primary production computed by an ensemble of 
global land models. We also show that the ratio of night-time net 
ecosystem exchange to gross primary production is rising across 
the FLUXNET2015!? dataset. All trends are robust to sampling 
variability in ecosystem type, disturbance, methodology, CO, 
fertilization effects and mean climate. Taken together, our findings 
provide observational evidence that global Ry is rising, probably 
in response to environmental changes, consistent with meta- 
analyses'*-!° and long-term experiments’. This suggests that 
climate-driven losses of soil carbon are currently occurring across 
many ecosystems, with a detectable and sustained trend emerging 
at the global scale. 

The sensitivity of Ry to ongoing changes in temperature, precip- 
itation and organic matter input to the soil system remains highly 
uncertain. Because of the large stocks of global soil organic carbon 
content (SOC) and quickly changing climatic conditions, this uncer- 
tainty has large implications for predicting future dynamics of Earth’s 
climate system and carbon (C) cycle®. But because Ry observations are 
infrequently performed, necessarily small-scale and highly variable, it 
is difficult to detect and attribute annual-to-decadal changes at larger 
spatial scales. 

One way to quantify and interpret shifts in the flux of C from SOC 
is to examine the changing ratios between Ry and other parts of the 
C cycle. For example, widespread increases over time in the ratio of 
Ry to the larger Rg flux, a ratio which typically varies in a predictable 
manner’®, would suggest that Ry is rising. Quantifying this change 
would provide new constraints on whether greater mineralization 
of SOC is occurring, and the degree to which increased inputs by 
enhanced gross primary production (GPP) are affecting Ry®!°. We used 
a global soil respiration database (SRDB)’, expanded and updated to 
include studies reporting data through 2014, to test these possibilities. 

The mean Ry:Rs ratio observed in SRDB has risen over time, from 
0.54 +0.18 in 1990-1998 to 0.63 + 0.16 in 2007-2014 (Fig. 1). This 
change is significant (P= 0.009, n =318) when ecosystem mean annual 
temperature (MAT) and mean annual precipitation (MAP) are con- 
trolled for in a linear model (Table 1, Extended Data Table 1, Extended 
Data Fig. 1). This trend in the Ry:Rs ratio was not induced by a site 


selection bias, as this model also controls for the possibility of research- 
ers sampling more disturbed sites, sites with varying land cover or SOC 
stocks, and using different methods to partition Ry from Rs. The SRDB 
covers most of the global climate space (from —13.2°C to 27.9°C MAT, 
and from 63 mm to 4,563 mm MAP; Extended Data Fig. 2); examin- 
ing climate controls?’ on Ry, both MAP (P<0.001, exerting a positive 
effect), and potential evapotranspiration (P=0.006) had the strongest 
effects on annual Ry, with disturbance and temperature exerting 
complex interactive effects. 

A rising Ry:Rs ratio could be due to rising SOC losses and thus a 
climate feedback, and/or increasing GPP rates enhancing detritus 
inputs and thus counterbalancing C losses from SOC. To distinguish 
these possibilities, we examined the ratio of all soil-derived respiration 
fluxes to GPP”!, the ultimate source of both the autotrophic and hetero- 
trophic soil surface CO; fluxes. We note that the more relevant ratio to 
study is that of Ry to net primary production (NPP) rather than to GPP, 
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Fig. 1 | Changes in Ry:Rs over time. Data are drawn from unmanaged, 
unmanipulated ecosystems in a global soil respiration database’, binned 
into 8-9 year groups (this binning is for display only, and was not part 
of the statistical analysis summarized in Table 1), and shown with linear 
regression lines; for clarity, one extremely low value is not shown. Note 
logarithmic axes. Inset, changes in Ry:Rs density distribution over time. 
See also Extended Data Table 1. 
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Table 1 | Effects in linear model of how Ry:Rs changes over time 


Degreesof Sumof Meansum 
Effect freedom squares ofsquares  F P 
Year 1 0.151 0.159 6.972 0.009 
Disturbance 1 0.018 0.018 0.826 0.364 
Partitioning method 10 0.363 0.036 1.677 0.086 
MAT 1 0.439 0.439 20.293 <0.001 
MAP 1 0.005 0.005 0.238 0.626 
SOC 1 0.408 0.408 18.877 <0.001 
Year * Method 7 0.432 0.062 2.853 0.007 
MAT * MAP 1 0.075 0.075 3.460 0.064 
Year * SOC 1 0.058 0.058 2.656 0.104 
Residuals 293 6.339 0.022 
This table summarizes the statistically significant effects in the linear model examining how Ry:Rs 


changes over time. Effects tested include year of observation, disturbance (ecosystem coded as 
aggrading versus mature), method used to partition Ry from Rs, land cover (deciduous forest, 
evergreen forest, grassland, savannah, other), MAT, MAP and SOC. Terms that were not significant 
in the final model do not appear in the table. An asterisk denotes an effect interaction. 


but global NPP estimates are subject to more uncertain assumptions. If 
increased SOC mineralization rates are raising Ry, one would expect 
both Ry and the total surface efflux Rs to rise relative to GPP. 

The available data support this expectation as both Ry (the variable 
of primary interest, but for which far fewer measurements are available) 
and Rs (which incorporates both Ry and belowground autotrophic 
respiration) exhibit significant positive trends over time (P=0.012 
for Ry:GPPmrr, P< 0.01 for all others; Fig. 2a) relative to GPP. These 
results are controlled, as above, for climate, land cover, partitioning 
method and disturbance. Grasslands exhibited much stronger Rs:GPP 
trends than forests for both the MODIS" and MTE” GPP datasets, per- 
haps because of their frequently high belowground carbon allocation. 

Solar-induced chlorophyll fluorescence (SIF) is correlated with 
GPP and provides an independent test of how Ry is related to carbon 
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assimilation over time. We examined the ratios of Ry and Rs to two 
global SIF data products (SCIAMACHY and GOME-2; see Methods). 
Both Ry and Rs exhibited temporal trends against the SCIAMACHY 
data after controlling for disturbance and measurement method 
(P <0.001 for both; Fig. 2a). No Ry:SIF temporal trend was found 
using the GOME-2 SIF data, and Rs:SIF was only marginally significant 
(P =0.045), but this is by far the shortest (2007-2014) of the datasets 
tested. 

It is important to note that remotely sensed GPP and SIF estimates 
may not fully capture GPP increases owing to, for example, CO2 
fertilization’”. The results above are, however, robust to such poten- 
tial ‘missed’ GPP, even if one assumes a high rate of CO2-driven GPP 
increase completely missed by satellite-based products (Extended Data 
Figs. 4 and 5). We also found a significant (P = 0.008, n = 263) temporal 
Ry:GPP trend using GPP from a wide range of global models” that 
include CO; fertilization mechanisms (Fig. 2b), again controlling for 
the previously mentioned variables. This underscores the robustness 
of the trends shown in Fig. 2a. 

A second challenge to the above analysis is the large spatial mismatch 
between remotely sensed GPP and SIF data (over 10° m7) versus Ry and 
Rs measurements (about 1 m2). To address this, we used eddy-covariance 
C flux data reported in the FLUXNET2015 database to examine tempo- 
ral changes in co-located ecosystem respiration and production. Night- 
time net ecosystem exchange (NEEnignt) is generally dominated”* by Rs. 
Consequently, NEEnight might increase relative to GPP over time, if Ry 
and thus Rs are rising. We used the full “Tier 1’ dataset (n = 1,162) to 
test this expectation and found that the annual NEEnight}GPPauxnet ratio 
is significantly rising with time (P=0.002). We also attempted to use 
FLUXNET data to identify GPP measurements made in the same study 
site and year as site-specific Rs measurements, but from a bootstrap 
analysis, concluded that the small size of this dataset was unlikely to be 
adequate for this purpose (Extended Data Figs. 7 and 8). 

A third problem is that, as in many previous global syntheses, the 
above analyses combine time and space responses. For a final test, 
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Fig. 2 | Changes in ratios of respiration to production over time. 

a, Changes in the ratio of respiration to GPP and SIF over time. Two 
respiration fluxes (Ry and Rs), two GPP sources (the MTE and MODIS 
datasets), and two SIF sources (the SCIAMACHY and GOME-2 datasets) 
are shown. For clarity, several high-ratio points are cut off. Points and 
linear regression lines are coloured by land cover (see key in b); lines 

in a panel imply that the overall temporal trend was significant in that 
panel. Grey shading shows 95% confidence intervals; residuals from these 


models are shown in Extended Data Fig. 3. b, Changes in the ratio of field- 
measured Ry to GPP modelled by suite of land models”? over time. The 
trend line shows the statistically significant (P < 0.001) positive temporal 
trend in Ry:GPP using the GPP of ISIMIP models (see Table 1) and 
accounting for climate, land cover, disturbance and so on. ¢, Site-specific 
Ry trends in managed, unmanaged and natural ecosystems. Linear trend 
(not rising, slope < 0, versus rising, slope >0) of Ry for sites in the SRDB? 
reporting at least three annual Ry measurements over at least eight years. 
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Table 2 | Evidence supporting a global rise in soil Ry 


Temporal Land cover 
Test n trend? significant? 
ncreasing Ry:Rs 318 Yes No 
ncreasing Ry:GPPwte 266 Yes No 
ncreasing Rs:GPPwte {622 Yes* Yes 
ncreasing Ry:GPPwopis 251 Yes Yes 
ncreasing Rs:GPPmopis 256 Yes Yes 
ncreasing Ry:SIFscia, 83 Yes No 
ncreasing Rs:SIFscia 879 Yes No 
ncreasing Ry:SIFgome-2 90 fo) No 
ncreasing Rs:SIFgome-2 402 Yes No 
ncreasing Rs:GPPruxnet 21 fo) Yes 
ncreasing NEE, ight:GPPruxnet ,162 Yes Yes 
ncreasing Ry:GPPisimip 263 Yes No 
Ry climate response 332 /A No 
Tests examined changes in ratios of Ry to Rs; Ry and Rs to GPP and SIF; NEEnight to FLUXNET!2 
GPP; ISIMIP23 GPP; and the response of Ry to climate. Columns include number of observations 
(n), significance (P< 0.05) of a 1989-2014 temporal trend after climate and other factors are 
accounted for in a linear regression, and whether the temporal trend was influenced by land cover 
(coniferous forest, deciduous forest, grassland, savannah, other). All temporal trends were also 


significant in a Theil-Sen robust regression unless otherwise noted with an asterisk. 


we examined longitudinal, site-level Ry records from both managed 
and unmanaged ecosystems. There are only 13 sites in SRDB with such 
long-term records, and 8 of these exhibit rising Ry (Fig. 2c); when these 
data are pooled and controlled for climate, mean Ry also exhibits a 
significant (P< 0.001) rising trend. This sample is very small and lies 
in a tighter climate change space than the main dataset (Extended Data 
Fig. 9), but the prevailing positive trend for a majority of sites is consist- 
ent with the analyses above given the site-specific diversity of factors 
controlling Ry responses to climate and atmospheric CO, changes. 
Several sensitivity analyses (see Methods) suggest that, because of 
their high variability and short record length, individual sites in both 
FLUXNET and SRDB remain unable to reliably detect an increase in 
Ry; the signal is only slowly emerging from the noise, even for sites 
with multi-decadal records and across ecosystem types (Extended 
Data Fig. 6). 

In summary, we have shown that multiple independent results 
(Table 2) converge to point towards a consistent finding of rising global 
Ry. This could be explained by at least two mechanisms, which are not 
mutually exclusive. 

First, increased Ry might be temporarily fuelled by shifts in SOC 
forms and availability, and thus increased substrates for SOC miner- 
alization, with little or no change in total SOC®. For example, eddy- 
covariance measurements from >10-yr FLUXNET sites indicate”>® 
that increases in GPP are outpacing concurrent rises in ecosystem respi- 
ration, resulting in greater ecosystem C uptake and detritus production, 
and thus greater bioavailable C for microbial metabolism. Highly pro- 
ductive grasslands and deciduous forests might be expected to respond 
more quickly to abiotic drivers”’, and more slowly decomposing ever- 
green litter may even inhibit microbial activities”, together potentially 
explaining the trends shown in Fig. 2. This would be consistent with 
a 2015 inferred (as a residual, not measured) land sink of 1.9 +0.9 Pg 
C yr! (ref. 7°), but such a substrate-driven mechanism is not likely to 
be permanent, given the imbalance between soil C inputs and outputs 
(Fig. 2a). The C mass balance reported by FLUXNET sites exhibiting 
increased GPP and litter production?>” also suggests that new sub- 
strate inputs are not rising fast enough to keep up with increasing Ry. 

A second and more tenable possibility is that rising Ry reflects 
enhanced SOC mineralization driven by climate changes, in particu- 
lar rising global temperatures. If (as we calculate) global Ry has risen 
by approximately 1.2% over 25 years, against a mean air temperature 
change of +0.7°C, this would be broadly consistent with warming 
experiment syntheses'*-!°. Assuming that rising Ry does in fact reflect 
SOC losses, this raises a consistency problem with estimated increases 
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in the terrestrial C sink”*. It also poses a methodological challenge: 
small changes in Ry and SOC are difficult and perhaps impossible 
(see Methods) to detect at individual sites, even those with a decade or 
more of data, but will add up to substantial losses at the global scale. 

A number of limitations should be noted. The SRDB (as well as most 
Earth-observing networks such as FLUXNET) is dominated by data 
from Northern Hemisphere, upland sites, and there are relatively few 
data from high latitudes and tropics (about 14% of SRDB for each), 
relative to the large areas and carbon pools of these biomes. Some of the 
data are also not fully independent: for example, Ry is usually measured 
independently of Rs, but occasionally (3-5% of these data) it is esti- 
mated by subtraction of autotrophic respiration from Rg, introducing 
an autocorrelation. Finally, any observational analysis such as this infers 
causality, and thus it is necessary to maintain and expand long-term 
manipulative experiments!*!”, 

These results pose new scientific challenges and opportunities for 
model benchmarking, hypothesis generation and testing, ecological 
forecasting and experiments. Rigorously testing for global SOC changes 
will require improvements in measuring, reporting and verification 
protocols, as it is not currently possible to derive a time-variant SOC 
map from repeated inventories except in ‘Tier 3’ countries*”. More open 
global datasets of SOC stocks and fluxes’, and more broadly distributed 
long-term ecological study sites with systematic and repeated Rs, Ru; 
and SOC measurements will help to test this finding of rising global Ry 
in the face of a seemingly stable or increasing terrestrial C sink. 
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Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0358-x. 


Received: 30 March 2017; Accepted: 11 June 2018; 
Published online 1 August 2018. 


1. Kochy, M., Hiederer, R. & Freibauer, A. Global distribution of soil organic 
carbon—Part 1: Masses and frequency distributions of SOC stocks for the 
tropics, permafrost regions, wetlands, and the world. Soi! 1, 351-365 (2015). 

2. Scharlemann, J. P. W., Tanner, E. V. J., Hiederer, R. & Kapos, V. Global soil carbon: 
understanding and managing the largest terrestrial carbon pool. Carbon Manag. 
5, 81-91 (2014). 

3. Bond-Lamberty, B. & Thomson, A. M. Temperature-associated increases in the 
global soil respiration record. Nature 464, 579-582 (2010). 

4. Hashimoto, S. et al. Global spatiotemporal distribution of soil respiration 
modeled using a global database. Biogeosciences 12, 4121-4132 (2015). 

5. Friedlingstein, P. et al. Uncertainties in CMIP5 climate projections due to carbon 
cycle feedbacks. J. Clim. 27, 511-526 (2014). 

6. Zhou, T., Phi, P, Hui, D. & Luo, Y. Global pattern of temperature sensitivity of soil 
heterotrophic respiration (Qio) and its implications for carbon-climate 
feedback. J. Geophys. Res. Biogeosci. 114, GO2016 (2009). 

7. Trumbore, S. E. & Czimezik, C. |. An uncertain future for soil carbon. Science 
321, 1455-1456 (2008). 

8. Giardina, C. P, Litton, C. M., Crow, S. E. & Asner, G. P. Warming-related increases 
in soil COz efflux are explained by increased below-ground carbon flux. 

Nat. Clim. Change 4, 822-827 (2014). 

9. Bond-Lamberty, B. & Thomson, A. M. A global database of soil respiration data. 
Biogeosciences 7, 1915-1926 (2010). 

0. Jung, M. et al. Global patterns of land-atmosphere fluxes of carbon dioxide, 
latent heat, and sensible heat derived from eddy covariance, satellite, and 
meteorological observations. J. Geophys. Res. Biogeosci. 116, GOQJ07 (2011). 

1. Zhao, M., Heinsch, F.A., Nemani, R. R. & Running, S. W. Improvements of the 
MODIS terrestrial gross and net primary production global data set. Remote 
Sens. Environ. 95, 164-176 (2005). 

2. Baldocchi, D. D. ‘Breathing’ of the terrestrial biosphere: lessons learned from a 
global network of carbon dioxide flux measurement systems. Aust. J. Bot. 56, 
1-26 (2008). 

3. Crowther, T. W. et al. Quantifying global soil carbon losses in response to 
warming. Nature 540, 104-108 (2016). 

4. Lu, M. et al. Responses of ecosystem carbon cycle to experimental warming: a 
meta-analysis. Ecology 94, 726-738 (2013). 

5. Wang, X. et al. Soil respiration under climate warming: differential response of 
heterotrophic and autotrophic respiration. Glob. Change Biol. 20, 3229-3237 
(2014). 

6. Zhou, L. et al. Interactive effects of global change factors on soil respiration and 
its components: a meta-analysis. Glob. Change Biol. 22, 3157-3169 (2016). 

7. Melillo, J. M. et al. Long-term pattern and magnitude of soil carbon feedback to 
the climate system in a warming world. Science 358, 101-105 (2017). 


© 2018 Springer Nature Limited. All rights reserved. 


18. 


19. 


20. 


21. 


22. 
23. 


24. 


25. 


26. 


27. 
28. 


29. 


30. 


Bond-Lamberty, B., Wang, C. & Gower, S. T. A global relationship between the 
heterotrophic and autotrophic components of soil respiration? Glob. Change 
Biol. 10, 1756-1766 (2004). 

Davidson, E. A. & Janssens, |. A. Temperature sensitivity of soil carbon 
decomposition and feedbacks to climate change. Nature 440, 165-173 (2006). 
Hursh, A. et al. The sensitivity of soil respiration to soil temperature, moisture, 
and carbon supply at the global scale. Glob. Change Biol. 23, 2090-2103 (2017). 
Vargas, R. et al. On the multi-temporal correlation between photosynthesis and 
soil COz efflux: reconciling lags and observations. New Phytol. 191, 1006-1017 
(2011). 

Anav, A. et al. Spatiotemporal patterns of terrestrial gross primary production: 

a review. Rev. Geophys. 53, 2015RG000483 (2015). 

Warszawski, L. et al. The Inter-Sectoral Impact Model Intercomparison Project 
(ISI-MIP): project framework. Proc. Natl Acad. Sci. USA 111, 3228-3232 (2014). 
Falge, E. et al. Seasonality of ecosystem respiration and gross primary 
production as derived from FLUXNET measurements. Agric. For. Meteoro/. 113, 
53-74 (2002). 

Pilegaard, K., Ibrom, A., Courtney, M. S., Hummelshgj, P. & Jensen, N. O. 
Increasing net COz uptake by a Danish beech forest during the period from 
1996 to 2009. Agric. For. Meteorol. 151, 934-946 (2011). 

Urbanski, S. P. et al. Factors controlling CO2 exchange on timescales from hourly 
to decadal at Harvard Forest. J. Geophys. Res. 112, GO2020 (2007). 

Keenan, T. F. et al. Increase in forest water-use efficiency as atmospheric carbon 
dioxide concentrations rise. Nature 499, 324-327 (2013). 

Adamczyk, S., Adamczyk, B., Kitunen, V. & Smolander, A. Monoterpenes and 
higher terpenes may inhibit enzyme activities in boreal forest soil. Soil Biol. 
Biochem. 87, 59-66 (2015). 

Le Quéré, C. et al. Global carbon budget 2016. Earth Syst. Sci. Data 8, 605-649 
(2016). 

Vargas, R., Paz, F. & de Jong, B. Quantification of forest degradation and 
belowground carbon dynamics: ongoing challenges for monitoring, reporting 
and verification activities for REDD+. Carbon Manag. 4, 579-582 (2013). 


LETTER 


Acknowledgements We are indebted to the thousands of researchers who 
measured and published the data collected here. This work used eddy- 
covariance data acquired and shared by the FLUXNET community (see 
Methods). This research was supported by the US Department of Energy, Office 
of Science, Biological and Environmental Research as part of the Terrestrial 
Ecosystem Sciences Program. The Pacific Northwest National Laboratory is 
operated for DOE by Battelle Memorial Institute under contract DE-ACO5- 
76RLO1830. R.V. acknowledges support from NASA-CMS (80NSSC18K0173) 
and USDA (2014-67003-22070). C.M.G. received additional support from 
the National Science Foundation Division of Environmental Biology, Award 
1353908. 


Reviewer information Nature thanks A. Konings, K. Ogle and the other 
anonymous reviewer(s) for their contribution to the peer review of this work. 


Author contributions B.B.-L. conceived this study, and together with C.M.G. and 
R.V. designed the analysis. M.C. provided SIF data processing and expertise, 
and with V.L.B. furnished key insights. B.B.-L. wrote the manuscript in close 
collaboration with all authors. 


Competing interests The authors declare no competing interests. 


Additional information 

Extended data is available for this paper at https://doi.org/10.1038/s41586- 
018-0358-x. 

Supplementary information is available for this paper at https://doi.org/ 
10.1038/s41586-018-0358-x. 

Reprints and permissions information is available at http://www.nature.com/ 
reprints. 

Correspondence and requests for materials should be addressed to B.B.-L. 
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional 
claims in published maps and institutional affiliations. 


2 AUGUST 2018 | VOL 560 | NATURE | 83 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


METHODS 

Datasets. This study used version v20170630a of the Global Soil Respiration 
Database (SRDB)*, downloaded 30 June 2017 from https://github.com/bpbond/ 
srdb. We used only records in the SRDB from studies that (i) reported annual 
Ry and/or Rg; (ii) had spatial (longitude and latitude) and temporal (measure- 
ment years) information; (iii) took place in natural or unmanaged ecosystems; 
(iv) had no experimental manipulation; and (v) used infrared gas analysers or gas 
chromatography (as opposed to for example, soda lime measurements that may 
underestimate CO, fluxes). 

In general, the SRDB is oriented to seasonal to annual Rs data, and focuses on 
spatial breadth (a large number of sites around the world) rather than temporal 
detail (it does not attempt to capture continuous Rs measurements, for example). 
The annual fluxes used here occasionally come from continuous year-round meas- 
urements, but more often are computed by authors from interpolated manual 
measurements (mean measurement separation, 21 days; mean annual coverage, 
89%). Such interpolation generally produces robust annual flux estimates?*!~?. 
Each site (that is, each unique longitude/latitude/ecosystem type combination) 
has on average 1.8-2.0 years of data, although a few have longer available records 
(see below). 

These data (n = 334 for Ry and n= 1,852 for Rs) were then spatially and tempo- 
rally matched with a variety of ancillary datasets using the R ‘raster’ package version 
2.5.8. Air temperature, precipitation and potential evapotranspiration came from 
the CRUTEM4 (Climatic Research Unit, University of East Anglia) climate data*? 
downloaded 5 January 2017 from http://www.cru.uea.ac.uk/data. Because precip- 
itation has higher uncertainty than temperature, and is only indirectly linked with 
the soil moisture that directly affects soil respiratory metabolism, we also exam- 
ined the ESA Soil Moisture Climate Change Initiative (CCI) ‘Combined’ dataset, 
Version 02.2, downloaded 6 June 2017 from http://data.ceda.ac.uk/neodc/esacci/ 
soil_moisture/data/daily_files/COMBINED/v02.2/. Using these data instead of the 
CRU precipitation data did not change the overall results, but the ESA-CCI dataset 
has poor or no coverage in cloudy tropical regions (see ref. *4 for example and also 
http://www.esa-soilmoisture-cci.org/node/93), and when linked with the SRDB 
data under analysis here, 8% of data lacked soil moisture values, versus 5 (0.1%) 
lacking precipitation. We thus proceeded using precipitation only. 

The bulk density and soil organic C content variables of the SoilGrids 1 km 
dataset*> were downloaded on 8 January 2017 from https://soilgrids.com and used 
for a simple computation of 1 m SOC stock (as bulk density times C concentra- 
tion). The geographic location of a few (around 10) soil respiration observations 
was missing in the SoilGrids dataset, and for these we substituted biome- and 
ecosystem-specific median values. 

We used two global GPP datasets: the 1982-2011 MTE!? GPP dataset, down- 
loaded on 5 January 2017 from https://www.bgc-jena.mpg.de/bgi/index.php/ 
Services/Overview, and the 2000-2015 MODIS GPP dataset! downloaded on 
6 January 2017 from http://www.ntsg.umt.edu/project/modis/mod17.php. GPP 
data were used instead of NPP data because of the higher errors in the latter!?*° 
due to uncertainties surrounding autotrophic respiration fluxes. We also extracted 
grid-cell-specific mean GPP from outputs of 8 global models in the ISIMIP?? 
project, which uses community-agreed sets of scenarios with standardized climate 
variables and socio-economic projections. 

For a more site-specific measure of GPP and C exchange, “Tier 1’? FLUXNET2015 
data were downloaded on 30 January 2017 from http://fluxnet.fluxdata.org/ 
data/fluxnet2015-dataset/ and filtered for quality (NEE_VUT_REF_QC > 0.5). 
FLUXNET GPP was linked to a given Rs and/or Ry measurement if both meas- 
urements occurred within 5 km, in the same ecosystem type, and in the same year. 

This work used eddy-covariance data acquired and shared by the FLUXNET 
community, including these networks: AmeriFlux, AfriFlux, AsiaFlux, 
CarboAfrica, CarboEuropeIP, Carboltaly, CarboMont, ChinaFlux, Fluxnet- 
Canada, GreenGrass, ICOS, KoFlux, LBA, NECC, OzFlux-TERN, TCOS-Siberia 
and USCCC. The ERA-Interim reanalysis data are provided by ECMWE and 
processed by LSCE. The FLUXNET eddy-covariance data processing and har- 
monization was carried out by the European Fluxes Database Cluster, AmeriFlux 
Management Project, and Fluxdata project of FLUXNET, with the support of 
CDIAC and ICOS Ecosystem Thematic Center, and the OzFlux, ChinaFlux and 
AsiaFlux offices. Data collected by Beringer in the OzFlux network was funded 
under an ARC FT (FT1110602). Specific FLUXNET sites and years used can be 
found in the Supplementary Information. 

Finally, we used the global Solar-Induced chlorophyll Fluorescence (SIF) dataset 
from the Global Ozone Monitoring Experiment-2 (GOME-2) instrument onboard 
the MetOp-A platform and the SCanning Imaging Absorption spectroMeter for 
Atmospheric CHartographY (SCIAMACHY). This chlorophyll fluorescence sig- 
nal is strongly linked with GPP?” (although the ratio of SIF to GPP varies by land 
cover, and thus the SIF analysis is unable to capture changes in GPP due to land 
cover change). The monthly mean quality-filtered and gridded GOME-2 and 
SCIAMACHY SIF data (that is, SIFcome-2 and SIFscjamacuy) were used to reduce 


the high data noise. The gridded GOME-2 data** is available at 0.5° resolution from 
2007 to 2015, and SCLIAMACHY?°? is at 1° from 2003 to 2011. These two datasets 
are so far the longest global SIF records. Both are publicly available at ftp://ftp. 
gfz-potsdam.de/home/mefe/GlobFluo/. For convenience in visualizing temporal 
changes in Ry:GPP and Ry:SIF ratios, SIF data were rescaled to the range of GPP 
(0-3,500 g C m’ yr“) in plots. 

In general, our analysis assumed the following basic relationships“ between 
various carbon cycle fluxes: 

Total soil respiration is the sum of its (belowground) autotrophic and hetero- 
trophic components: 


Rg = Ra (below) + Ru 
Autotrophic respiration is comprised of above- and belowground components: 
Ra = Racabovey + Ra (below) 
NPP is the balance of GPP and autotrophic respiration: 
NPP = GPP—R, 
In the absence of disturbance, NEE is equivalent to NPP minus Ry: 
NEE & NPP—Ry 


Statistical analysis. Some data were excluded a priori (that is, before constructing 
the analysis). These included: (i) studies performed before 1989, roughly when 
infrared gas analysers began to be widely used; (ii) several extremely high MODIS 
GPP values (>10 kg C m~? yr~}); (iii) 27 points with Rg:GPPfuxnet > 5. 

We used linear regression, weighted by years of Ry or Rs observation, to examine 
changes over time in the ratios of Ry:Rs, Ry:GPP, Rs:GPP, Ru:SIF and Rs:SIE, as well 
as the influence of climate on Ry. For each analysis detailed in the text, a full model 
was fitted for the variable of interest (for example, of Ry:Rs) that controlled for 
recent disturbance (the SRDB ‘Stage’ field, aggrading or mature), temperature and 
precipitation normals, and land cover; year of study had a first-order interaction 
with all these independent variables. Land-cover groupings included deciduous 
forests, n = 782; evergreen forests, n= 1,058; grasslands, n = 270; savannahs, 
n= 116; and other, n= 321. Both MAP and its square (that is, MAT?) were 
included’. All models were examined for influential outliers and deviations 
from normality; no transformation of dependent variables was performed. 
Non-significant terms were then eliminated using a forward-and-back stepwise 
algorithm (using the R package ‘MASS; version 7.3-47) based on the Akaike 
Information Criterion. Generally the text reports F statistics and P values from 
the analysis of variance (ANOVA) results of these linear models. A Theil-Sen 
estimator“! was also computed for each temporal trend, independent of the linear 
regressions, using the ‘mblm R package version 0.12. 

A linear model relating Ry to climate-annual air temperature (°C), precipita- 
tion (mm) and its square—to allow for a nonlinear response”, potential evapo- 
transpiration, recent disturbance, and leaf habit—was fitted (adjusted R? = 0.25, 
RSE=23.3 gC m “yr ') and used to estimate global Ry. Global fluxes were based 
on 1989-2014 HadCRUT4* data and projected onto a 0.5° grid (grid area data 
downloaded from http://eos-webster.sr.unh.edu/). 

About one-third of the SRDB Rg data have associated uncertainties (that is, an 

annual flux reported as X + Y) due to measurement error, spatial variability, and so 
on. The mean coefficient of variability for these Rs errors was 15%, and the median 
coefficient of variability was 10%. We did not attempt to propagate these uncer- 
tainties through our analysis, given their variable origin and incomplete nature, 
but a more complete treatment of uncertainty” will be important for testing the 
robustness of our results in the future. 
Sensitivity analyses. We performed a number of tests to assess the robustness 
of our findings. To test whether the findings in Fig. 1 were an artefact of the 
1989-2014 period chosen, the model summarized in Table 1 was repeatedly fitted 
to the SRDB dataset, as filtered with various first and final year assumptions (that 
is, the earliest and latest measurement allowed). We found that these results were 
not sensitive to choice of first and last years unless the timespan of the data was 
dramatically shortened (by at least 50%; Extended Data Fig. 1). 

In addition, because satellites probably miss some large fraction of climate- 
driven terrestrial GPP increases”, we examined the robustness of the results in 
Fig. 2a to this potential problem. First, we quantified the divergence between eddy 
covariance (FLUXNET) and remotely sensed GPP (Extended Data Fig. 4). Second, 
we asked how much GPP would satellites have to be missing to invalidate the 
GPP-respiration trends shown in Fig. 2a using a sensitivity analysis that assumed a 
very conservative (that is, high) GPP trend“? of 0.5% yr”! and repeatedly re-fitted 
the regressions shown in Fig. 2a assuming that satellites missed 0%, 10%, 20%, 
... 100% of this gain (Extended Data Fig. 5). Third, grid-cell-specific mean GPP 
was extracted from outputs of 15 global models in the ISIMIP**° project—which 
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uses community-agreed sets of scenarios with standardized climate variables and 
socio-economic projections, and whose GPP models include CO; fertilization— 
and compared to observed Ry (Fig. 2B). 

We also examined whether the inconsistency in longitudinal (site) results 
(Fig. 2c) fell within the range of what might naturally occur—that is, whether the 
variability was numerically consistent with the variability in global carbon flux 
datasets. To do so, we calculated the chance of observing a statistically significant 
Ry increase at some random location on Earth's surface over the last 25 years 
using a global gridded Ry dataset’, by testing for a significant temporal trend in 
each of its 57,048 terrestrial grid cells. We calculated this probability to be 26%: 
in other words, there would be only a 1-in-4 chance of observing a significant Ry 
increase even with a quarter-century of perfect data (‘perfect’ in the sense of no 
random measurement error, as would occur in reality), because the climate-driven 
Ry increase is small compared to interannual Ry variation. The probability of 
observing a statistically significant Ry increase using only the last 10 years’ data, 
comparable to the record length of most of the 13 longitudinal sites (Fig. 2c), was 
only 5%. A parallel exercise was performed with the global GPP datasets and 
yielded similar results. We also compared the sites’ location in climate change 
space with that of the overall Fig. 1 dataset (Extended Data Fig. 9). 

Finally, we performed a bootstrap analysis to quantify whether the FLUXNET 
data subset (that included GPP measurements made in the same study site and 
year as site-specific Rs measurements in the SRDB) was adequate, as this was a 
small dataset (1 = 106 site-years, 19 sites) dominated by one site (Harvard Forest, 
with 56% of the available data). What was the likelihood that this dataset is simply 
too small to detect signals of rising Ry, given site-to-site variability in climate 
and carbon dynamics? To answer this we made different random draws from the 
original dataset, with 1,000 bootstrap draws per fraction of artificial no-trend data 
(Extended Data Figs. 7 and 8). 

Code and data availability. The R language and environment for statistical 
computing” version 3.4.3 was used for all analyses. Code and data to reproduce 
all results are available at https://github.com/bpbond/rh-changes. 
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Extended Data Fig. 1 | Effect of dataset temporal span on Ry:Rs trend. were an artefact of the 1989-2014 period chosen, indicated by a black 
The model summarized in Table 1 was repeatedly fitted to the SRDB diamond in the figure). Colour shows the significance of the Ry:Rs 
dataset, as filtered with various first and final year assumptions (that is, temporal trend. 
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Extended Data Fig. 2 | SRDB coverage in terrestrial climate space. Blue distribution in the HadCRUT4*? data, with darker shades indicating more 
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Extended Data Fig. 3 | Respiration-to-production model residuals 
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upscaled MTE dataset, and the remotely sensed MODIS product), and two 
SIF sources (SCIAMACHY and GOME-2) are shown. Grey bands show 


respiration to GPP or SIF—is related to the independent variables: climate, 95% confidence intervals. Blue lines indicate least-squares trend, while 


land cover, disturbance and SOC content; for more details, see Methods. 
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Extended Data Fig. 6 | Ratio of NEEnignt to GPP in the FLUXNET 2015 evergreen needleleaf forest (ENF), grassland (GRA), mixed forests (MF), 
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Extended Data Table 1 | Coefficients for general model?® of relationship between Rs and Ry 
Coefficient Estimate Std. Error t P 
(Intercept) -27.771 7.653 -3.629 <0.001 
log(Rs) 0.867 0.033 26.044 <0.001 
Year 0.014 0.004 3.653 <0.001 


A model of form In(Ry) © In(Rs)*Year was fitted (following the protocol in ref. 18) to the data shown in Fig. 1, with ‘Year’ being the numerical year of measurement. Coefficient name, value and standard 
error, t value and P value are shown. Model residual standard error = 0.335, adjusted R? = 0.698, Fo317= 396.6, P< 0.001, AIC= 212.745. 
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Blue boron-bearing diamonds from Earth’s lower 


mantle 


Evan M. Smith!*, Steven B. Shirey*, Stephen H. Richardson’, Fabrizio Nestola*, Emma S. Bullock®, Jianhua Wang? & Wuyi Wang! 


Geological pathways for the recycling of Earth’s surface materials 
into the mantle are both driven and obscured by plate tectonics!->. 
Gauging the extent of this recycling is difficult because subducted 
crustal components are often released at relatively shallow depths, 
below arc volcanoes*”. The conspicuous existence of blue boron- 
bearing diamonds (type IIb)®’ reveals that boron, an element 
abundant in the continental and oceanic crust, is present in 
certain diamond-forming fluids at mantle depths. However, both 
the provenance of the boron and the geological setting of diamond 
crystallization were unknown. Here we show that boron-bearing 
diamonds carry previously unrecognized mineral assemblages 
whose high-pressure precursors were stable in metamorphosed 
oceanic lithospheric slabs at depths reaching the lower mantle. We 
propose that some of the boron in seawater-serpentinized oceanic 
lithosphere is subducted into the deep mantle, where it is released 
with hydrous fluids that enable diamond growth”. Type IIb 
diamonds are thus among the deepest diamonds ever found and 
indicate a viable pathway for the deep-mantle recycling of crustal 
elements. 

Type IIb diamonds—including the Hope, a renowned blue 
diamond—are mantle-derived minerals that contain boron at 0.01- 
10 p.p.m. levels and show a lack of nitrogen absorption in infrared 
spectroscopy*. Boron imparts their blue colour and p-type semi- 
conductivity, although they may not always appear blue when boron 
concentrations are low or additional defects are present. Because 
boron is a quintessential crustal element with a low concentration in 
Earth’s mantle!!, blue diamonds and their formation have long been a 
geochemical enigma. 

Type Ib diamonds have been recovered from worldwide local- 
ities, including southern and central Africa, India, South America 
and Borneo”, having been brought to the surface in kimberlite 
volcanoes ranging in age from the 1.15-billion-year-old Premier pipe’” 
to the ~90-million-year-old Letseng deposit'*. They can reach large 
sizes, such as the 176.2-carat Brazilia diamond, the 122.5-carat rough 
diamond that yielded the (24.18-carat) Cullinan Dream, which was 
examined as part of this study, and the 112.5-carat rough diamond 
from which the Hope was cut?. 

The geological origin of blue diamonds has nevertheless remained 
unknown owing to their rarity (<0.02% of mined diamonds; 
see Methods), high value and general lack of mineral inclusions. To over- 
come this problem, prospective samples were screened from the extensive 
grading operations of the Gemological Institute of America. Over two 
years, this approach allowed examination of 46 type IIb diamonds with 
inclusions, an invaluable suite for analysis (Extended Data Fig. 1). 

Inclusions were characterized using Raman spectroscopy (Fig. 1) and 
were found to differ substantially from the common minerals found 
in diamonds from the cratonic lithosphere (<200 km), such as olivine 
and Cr-rich pyrope from peridotite or grossular-almandine-pyrope and 
omphacitic clinopyroxene from eclogite. Instead, the inclusion mineral- 
ogy is typical of super-deep diamonds originating from the mantle tran- 
sition zone to the lower mantle (Extended Data Fig. 2)'*-!” Inclusions 


in sublithospheric diamonds tend to destabilize during ascent in the 
mantle and break down to lower-pressure minerals, often unmixing 
into composite assemblages'*!’. Many inclusions described here are 
multiphase assemblages, as is the case with previously studied inclu- 
sions in super-deep diamonds and their high-pressure experimental 
analogues!*'®!7, It is implausible that the same multiphase assemblages 
could be coincidentally replicated by random sampling of lower- 
pressure mineral aggregates at shallower, lithospheric depths". 

The most abundant inclusion identified, in 31 of 46 samples, was 
Ca-silicate dominated by CaSiO3 walstromite, sometimes with larnite 
(8-CazSiO.) and other phases of CaSiO3 composition (Extended Data 
Table 1). These inclusions are commonly interpreted as retrogressed 
CaSiO3 perovskite (Ca-Pv)'*!”18, As retrogression of pure Ca-Pv alone 
should maintain a bulk Ca:Si ratio of 1, the presence of (Ca-rich) larnite 
in some inclusions may indicate that diamond growth occurred ina 
chemically evolving system with variable calcium enrichment, as seen 
in other super-deep diamonds!®!?»0, 

Other observed inclusions also correspond to retrogressed high- 
pressure minerals (Extended Data Table 1). For example, inclusions 
of orthopyroxene, with sharp Raman spectra matching enstatite, and 
minor amounts of coexisting olivine are interpreted as retrogressed 
bridgmanite'*'°, the lower-mantle Mg-silicate perovskite phase. 
Multiphase inclusions containing ortho- or clinopyroxene, coexisting 
with jeffbenite (Mg3Al,Si30,2) or spinel ((Mg,Fe)Al,O,), are inter- 
preted as aluminous bridgmanite"’, although some bearing clinopy- 
roxene may represent retrogressed majoritic garnet!” 

One inclusion provides a convincing example of retrogressed 
majorite. Being fortuitously exposed on a facetted diamond, the 
two-phase assemblage of NaAl-clinopyroxene and jeffbenite (Fig. 1b) 
was confirmed with microanalysis by energy-dispersive X-ray spectros- 
copy (Extended Data Fig. 3) and was interpreted as a former low-Ca, 
high-Na majoritic garnet’’. A separate inclusion of orthopyroxene in 
this diamond, interpreted as former bridgmanite, would then make 
a putative majorite—bridgmanite pair that would restrict its origin to 
within!” ~660-750 km. Other observed inclusion phases are coesite 
(with accessory kyanite, interpreted as former stishovite) as well as 
ferropericlase (found as relatively small, brown inclusions; see Extended 
Data Fig. 4 and Supplementary Table 1). 

Another diamond contains a multiphase inclusion dominated by 
nepheline and spinel, interpreted as former calcium-ferrite-type (CF) 
phase or possibly new aluminous (NAL) phase, which is compelling 
evidence of derivation from host rocks of basaltic composition at 
lower-mantle depths (Fig. 1)!*!6!7_ The same diamond also contains a 
multiphase inclusion of Fe carbide, Fe sulphide and wiistite (Extended 
Data Fig. 5, Supplementary Table 2) that does not correspond to a 
known mineral but may represent a former metallic melt similar to 
those recently discovered in (boron-lacking) CLIPPIR (Cullinan-like, 
Large, Inclusion-Poor, relatively Pure, Irregularly shaped and Resorbed) 
diamonds’!. Three other type IIb samples also contain metallic-looking, 
magnetic inclusions that may be similar to those of CLIPPIR diamonds, 
although it should be stressed that these are a minor part of the type IIb 
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Fig. 1 | Selected Raman spectra of inclusions in type IIb diamonds. 

a, Former Ca-Pv, now CaSiO; walstromite, in sample 110205945970. 

b, Former majoritic garnet, now a composite of NaAl-pyroxene and 
jeffbenite (verified by scanning electron microscopy/energy-dispersive 
X-ray spectroscopy; see Extended Data Fig. 3) in sample 880000037816. 
c, Former stishovite, now coesite, in sample 101024478345. Also shown on 
the left is a composite coesite plus kyanite spectrum (green) from sample 
890000180201. d, Former CF, now composite of nepheline and spinel 
(with CH, fluid, not shown), in sample 110208245246. Dashed lines are 
reference spectra of CaSiO; walstromite’’, jeffbenite*’, nepheline and 
spinel'4, plus omphacite R061129 and coesite X050094 from the RRUFF 
database. Spectra are stacked vertically for clarity. Source Data. 


diamond inclusion suite, whereas CLIPPIR diamonds are dominated 
by metallic Fe-Ni-C-S inclusions". 

The observed inclusion mineralogy and the absence of typical silicate 
inclusions that characterize diamonds from the continental lithospheric 
mantle advocate for type IIb diamond growth in host rocks of basaltic- 
to-peridotitic bulk composition, consistent with the lithologies in 
subducted oceanic lithosphere reaching lower-mantle depths. Although 
some samples, particularly those containing Ca-Pv alone, may have 
grown in the mantle transition zone, the inclusion assemblages with 
former bridgmanite, ferropericlase and CF phase require an origin in 
the lower mantle!*””. 

In addition to inclusion mineralogy, a sublithospheric origin for blue 
diamonds is physically required by the extreme remnant pressure in 
some inclusions (see Methods). Using the pressure-induced shift in 
Raman spectral features, a 4.40.1 GPa remnant pressure was meas- 
ured in CaSiO3 walstromite, which requires entrapment at a pressure 
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Fig. 2 | Inclusions jacketed by thin films of fluid CH, and H2, revealed 
by Raman spectroscopy. a, CaSiO; walstromite (former Ca-Pv), 

with the main part of the inclusion circled, in sample 110208780369. 

b, Orthopyroxene (former bridgmanite) in sample 110208773706. 
Lobate sprays of small inclusions are thought to reflect expansion and 
proliferation of inclusion material into its own decompression crack. 


of at least ~9 GPa in the mantle (we note that these entrapment pres- 
sures are severe underestimates due to diamond deformation; Extended 
Data Fig. 6)”*. Similarly, a coesite inclusion with its main Raman peak 
shifted from 520.6 cm™! up to 537.9 £0.5 cm! (Extended Data Fig. 6) 
indicates extreme remnant pressure, far exceeding the 4 GPa highest- 
pressure benchmark for coesites observed in lithospheric diamonds 
(see Methods). X-ray diffraction also reveals remnant pressures of 
~1.8 GPa in ferropericlase inclusions, requiring a minimum entrap- 
ment pressure of 10.3 to 14.1 GPa, calculated at 1,200 K and 2,000 K, 
respectively, corresponding to a depth beyond 300 km, well below the 
base of the deepest continental lithospheric mantle keels”. 
Additional pressure observations come from the silicate inclusions 
in blue diamonds, which typically have large decompression cracks 
and often lobate sprays of tiny droplet-like satellite inclusions in 
healed fractures emanating from the main inclusion (Figs. 1, 2). These 
satellite inclusions formed during exhumation, as high internal inclu- 
sion stresses” relative to decreasing external confining pressure led 
to rupturing of the host diamond. Pervasive dislocation networks in 
type IIb diamonds (Extended Data Fig. 7) from plastic deformation and 
recovery~ are also consistent with a high-temperature, sublithospheric 
mantle origin. Equivalent dislocation networks also appear in sublith- 
ospheric type Ila, CLIPPIR diamonds”", but thus far have not been 
reported in diamonds from the cooler, shallower lithospheric mantle. 
Methane, and often hydrogen, (CH4+ H2) were detected by Raman 
spectroscopy in 13 diamonds (28% of samples) as a thin fluid layer 
around one or more inclusions of varying mineralogy (Fig. 2). The fluid 
is a result of hydrogen escaping the inclusion and accumulating at the 
inclusion-host interface. When hydrogen is surrounded by diamond, 
it can form methane by reacting with the surrounding carbon. Similar 
CH4+ H; fluid jackets have been found around metallic melt inclu- 
sions in natural and synthetic diamonds, formed when formerly 
dissolved atomic hydrogen diffused out of the inclusions upon cooling 
and decompression". In the present mineral inclusions, these CH, + Hy 
fluids are a strong indication that at least some of the retrograde 
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Fig. 3 | Formation of type IIb diamond. (1) Seafloor hydrothermal 
circulation causes serpentinization, introducing boron into oceanic 
lithosphere. (2) Subduction and metamorphism of serpentine to DHMS. 
(3) Breakdown of DHMS yields hydrous, boron-enriched fluid that 
migrates and evolves. It may gather carbon from the altered oceanic 
lithosphere. (4) Crystallization of boron-bearing diamond, triggered 

by redox reactions or in response to changing pressure, temperature or 
composition of the fluid. (5) Vertical transport may involve localized 
buoyancy associated with diamond-related metasomatism or an external 
mechanism such as a plume, with ultimate exhumation to the surface due 
to kimberlite volcanism. 
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mineral assemblage is hydrogen-saturated, implying that the original 
high-pressure minerals interacted with hydrous media. 

Blue diamonds exhibit an inclusion mineralogy that requires 
diamond formation in host rocks of basaltic-to-peridotitic bulk com- 
position in the lower mantle, much like inclusions described in other 
lower-mantle diamonds that have been linked to subducted oceanic 
lithosphere!*-!8. Inclusions of basaltic association, such as former 
stishovite or CF phase, clearly point to the involvement of subducted 
ocean crust’*!®, whereas those of peridotitic association, such as 
ferropericlase or low-Al bridgmanite, may be more closely related to the 
peridotitic portion of the oceanic lithospheric slab!°. These subduction- 
related mineral assemblages are supported by a range of light-to-heavy 
carbon isotope signatures!®”* in type IIb diamond, with three samples 
having a 8!3C of —13.4%o, —3.4%o and —1.8%o (8!3C is the parts-per- 
thousand deviation of '°C/!*C from the Pee Dee belemnite standard; 
see Supplementary Table 3), which complement three diamonds from a 
previous study with a 6'°C range of —20.8%o to —14.5 %o"®. 

Boron is scarce in the convecting mantle, being 100 times more 
depleted compared to Earth’s surface*!!. Its occurrence in blue dia- 
monds implies an anomalously boron-enriched mantle source, espe- 
cially considering that boron behaves incompatibly in diamond growth 
experiments. Although present observations do not rule out boron con- 
tribution from the ambient convecting mantle, the clear subducted- 
host-rock signatures strongly suggest that the boron is surface-derived, 
even if only small amounts of the full boron budget of the slab are 
subducted. Boron isotopic measurements may clarify its origin, but 
the prerequisite for non-destructive analysis, the lack of standards and 
the low boron concentrations in type IIb diamonds have been barriers 
to the use of current techniques. Nevertheless, support for a crustal 
boron source comes from the actual distribution of boron on Earth", 
the clear geophysical evidence of subducted slabs reaching the lower 
mantle’ and the already established link between certain lower-mantle 
diamonds and subducted oceanic crust'*. 

In deeply subducted oceanic lithosphere, the most likely boron 
source is metamorphosed serpentinized peridotite, with boron orig- 
inally introduced by hydrothermal seawater circulation**” (Fig. 3). 
Other boron carriers, such as sediments, organic matter or white 
micas in the ocean crust, are destabilized at relatively shallow depths, 
but serpentinized peridotite in the lithospheric-mantle portion of the 
slab can serve as an effective vehicle and is expected to be the largest 
boron reservoir within the slab**’. Like water, boron is structurally 
hosted in serpentine”. Under the right conditions, cool subduction 
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geotherms could successfully stabilize dense hydrous magnesium sili- 
cates (DHMSs) before serpentine breakdown, thereby creating a min- 
eralogical pathway for boron, water and other components to reach 
the mantle transition zone and beyond!!®”8. The depth range of blue 
diamond genesis may therefore relate to DHMS breakdown, releasing 
hydrous diamond-forming fluid’? as well as liberating boron (Fig. 3). 
Eventual transport of the blue diamonds to the surface is thought to 
involve upwelling mantle and kimberlite volcanism, as with other 
super-deep diamonds. 

Blue diamonds point to a subduction-driven geochemical pathway 
extending from serpentinized oceanic lithosphere at Earth's surface to 
the lower mantle. This picture is consistent with key features of blue 
diamonds: mineral inclusions of basaltic-to-peridotitic bulk com- 
position with affinity to subducted slabs'*; high-pressure mineral 
assemblages indicating a depth of origin reaching the lower mantle, 
coinciding with the projected breakdown of DHMS minerals”; 
CH, + H; fluids suggestive of hydrous media; and, most remarkably, 
the boron content of the diamonds themselves. If metamorphosed 
serpentinite ultimately provides the boron for type IIb diamonds, as 
proposed here, then it is also implicated as a carrier of water to the 
deep mantle*®”*. The recognition that blue diamonds originate from 
the lower mantle highlights a possible major pathway for ultra-deep 
water recycling on Earth. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0334-5. 
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METHODS 


Raman spectroscopy. Raman spectroscopy was performed at the Gemological 
Institute of America (GIA), New York, with a Renishaw InVia Raman Microscope, 
using a grating with 1,800 lines mm, a 50x lens with a numerical aperture of 
0.55, and the 514.5 nm (and 488 nm for additional analyses) output of an argon- 
ion laser set to 150 mW output power. Calibration was done using the 520.5 cm! 
Raman peak of a polished silicon wafer. 

Energy dispersive X-ray spectroscopy, electron microscopy and cathodolumi- 
nescence imaging. Qualitative scanning electron microscopy/energy-dispersive 
X-ray spectroscopy (SEM-EDS) examination was carried out at GIA, New York, 
using a Zeiss EVO MA10 SEM system with an Oxford X-MaxN 20 EDS detector, 
in variable-pressure mode with chamber pressure of 20 Pa and electron beam 
conditions of 10-20 kV and 1 nA. The same instrument was used to capture 
secondary-electron and backscatter images of exposed inclusions, with a chamber 
pressure of 20 Pa and electron beam conditions of 15 kV and 1 nA, as well as 
cathodoluminescence images (black and white, panchromatic), with a chamber 
pressure of 20 Pa and electron beam conditions of 15 kV and 0.5 nA. 

Electron microprobe analysis. Electron microprobe analyses were performed at 
the Geophysical Laboratory of the Carnegie Institution for Science on a JEOL JXA- 
8530F field-emission electron probe. The operating conditions were 15 kV and 
30 nA. Samples were coated with carbon. For the carbide analyses, Fe7C3, NiS and 
an Al-bearing enstatite glass were used as standards. The Fe7C; standard was a syn- 
thetic high-pressure iron carbide sample prepared at the University of Michigan*!. 
X-ray diffraction. X-ray diffraction analyses were done in the Department of 
Geosciences, University of Padova using a custom instrumentation setup including 
a high-brilliance MoKa micro-beam source operating at 50 kV and 0.8 mA, a 
Rigaku Oxford Diffraction SuperNova goniometer and a Dectris Pilatus 200K 
detector. The setup allows measurement of inclusions of very limited size, down 
to 5-10 tum, that are still in situ within diamonds. Although two samples were 
examined (110208245246 and 110208425476), owing to the difficulty in aligning 
the microbeam with individual inclusions, only the latter sample successfully 
produced reflections from inclusion phases. 

Mass spectroscopy. Carbon isotopic analysis was done at the Department of 
Terrestrial Magnetism of the Carnegie Institution for Science on a CAMECA 6F 
ion microprobe using an extreme energy filtering method. A 10 kV primary Cs 
beam of 5 nA was focused to ~15 zm and the beam centre was rastered across a 
20 um x 20 jm area on the sample, held at —4,650 V. Secondary ions of °C and 
13C were extracted at —5 kV with an energy window of 100 V. The C ions were 
counted for 1 s and the “°C ions for 5 s, and counting was repeated 200 times. The 
20 counting precision for 13C was about 0.5 %o. The diamond standards N198, 
4139, 1013, ‘Mao’ and JWH-NS were used to determine the instrumental mass 
fractionation and drift before and after sample analyses. 

Prevalence of type IIb diamonds. Type Ib diamonds are recognized as being 
relatively rare, but their rarity is seldom—if ever—defined quantitatively. To deter- 
mine a quantitative figure, a review was conducted for a sample set of 13.8 million 
natural gem-quality diamonds submitted to GIA. This sample set excludes syn- 
thetic diamonds, which are often of type IIb and might otherwise skew the results. 
Within the sample set of natural gem diamonds, 0.02% were recorded as being 
type IIb. We note that this figure is considered a robust cross-section of natural 
gem diamonds, but it comes with some uncertainty. A small number of the dia- 
monds surveyed may be duplicates, submitted for grading multiple times within 
the sample period; this may disproportionately affect the count of type IIb samples, 
especially for higher-valued blue gems. Also, the tally of type IIb diamonds may be 
inflated because they are more likely to be submitted to GIA for grading because 
of their colour, and thus their higher gem value. Conversely, a small proportion 
of type IIb diamonds may not have been officially recorded as type IIb in the 
database used, and by default would be counted as non-type IIb. Lastly, and most 
importantly, the above value is expressly for gem diamonds, and does not account 
for industrial goods. Given the colour, morphology and clarity characteristics of 
known type IIb diamonds, it is considered unlikely that many industrial-quality 
natural diamonds are of type IIb. Because about 70%-80% of mined diamonds are 
of industrial quality, the proportion of all diamonds (gem plus industrial) that are 
of type IIb may be as low as 0.004%. 

Samples. A list of all samples examined and the inclusion phases observed within 
them is given in Extended Data Table 1. We note that a diamond may contain 
multiple inclusions of the same mineralogy and that the list does not necessarily 
reflect the number of inclusions found in each sample. However, the abundance 
of a certain kind of inclusion within individual diamonds compares very well with 
the frequency of those inclusions across the whole suite. More important is the 
inclusion assemblage and the potential host rock paragenesis that it portrays. A 
visual overview of the sample suite is given in Extended Data Fig. 1. 

Metallic Fe-S-C-O multiphase inclusion. Sample 110208245246 contains an 
opaque inclusion (inclusion B) whose mineralogy was unclear from the initial 
Raman analysis. Subsequent polishing to expose this inclusion to examine it by 


electron microscopy revealed it to be a Fe-S-C-O multiphase inclusion, with 
three main phases: Fe sulphide, Fe carbide, and Fe oxide (Extended Data Fig. 5). 
These phases were also analysed using an electron microprobe (Supplementary 
Tables 1 and 2). The Fe sulphide matches pyrrhotite by composition, Fe,_,S, 
with minor variable surface tarnish smeared across the polished surface of the Fe 
sulphide, thought to be an artefact of the polishing process (see mottled texture of 
Fe sulphide phase in the oxygen X-ray map; Extended Data Fig. 5). The Fe oxide 
composition matches well with wiistite, Fe;_.O. The Fe carbide phase was verified 
to contain carbon, but quantification was complicated by the additional carbon 
signal from the carbon coat (and potentially the underlying and neighbouring host 
diamond material). The carbide phase is tentatively interpreted as an Eckstrom- 
Adcock carbide (ideally Fe7C3), although cohenite (Fe3C) is not ruled out without 
further analysis. This metallic Fe-S-C-O multiphase inclusion is similar to, but 
still distinct from, the Fe-Ni-C-S inclusions recently reported from CLIPPIR 
diamonds (a variety of large, inclusion-poor, often type Ila, sublithospheric dia- 
monds)”". In the samples examined, metallic Fe-S—C-O inclusions are a relatively 
rare occurrence, as opposed to Fe-Ni-C-S inclusions, which are the most common 
inclusion type seen in CLIPPIR diamonds. 

Sample containing ferropericlase (plus olivine and nyerereite), verified by X-ray 
diffraction. Sample 110208425476 contains multiple dark inclusions. These inclu- 
sions are unusual compared to those in the rest of the samples because they are 
very irregular in shape, distribution and abundance within the sample. These dark 
inclusions are accompanied by smaller (down to micrometre size) inclusions defin- 
ing curviplanar features interpreted as healed cracks. Where these features intersect 
the polished surfaces, the polish is interrupted slightly, as would be expected for the 
slight crystallographic misalignment of a healed crack. In cathodoluminescence 
imaging, these healed cracks appear as bright lines, which are intimately connected 
within the diamond’s strong overall dislocation network pattern (Extended Data 
Fig. 4). The cathodoluminescence image reveals complex features texturally con- 
sistent with intense bulk plastic/brittle deformation. 

Initial Raman spectroscopy revealed some inclusions to be (or at least contain) 
ferropericlase, according to two broad Raman bands at around 650-690 cm~! and 
190 cm™!, which correspond to ferropericlase in other diamonds in our experience. 
Later, polishing down to some of the inclusions revealed examples of ferropericlase, 
which were analysed with an electron microprobe (Supplementary Table 1). 

Many inclusions, however, did not yield any clear Raman signal. The ferroper- 

iclase inclusions (Extended Data Fig. 4) may have been modified during diamond 
deformation, especially if large cracks developed that allowed melt or fluid to 
penetrate and alter or augment the inclusion mineralogy. Examination with X-ray 
diffraction (University of Padova) revealed the presence of not only ferropericlase 
(unit cell edge length, 4.209(8) A; volume, 74.5(3) A?; uncertainties are estimated 
standard deviations per convention), but also olivine (unit cell dimensions, 
4.771(4) A, 10.266(6) A, 6.004(3) A; angles, 90.0°, 90.0°, 90.0°; volume, 294.0(3) A), 
and nyerereite, NaxCa(CO3)2 (observed d-spacings are 3.22, 6.42, 1.79, 1.52 and 
potentially others that overlap with diamond reflections). Some additional reflec- 
tions were observed that could not be conclusively assigned: 1.44 (very intense), 
2.04 (very intense, not overlapped with diamond), 2.40 (medium intensity), 2.86 
(low intensity) and very-low-intensity peaks at 1.84, 1.70 and 1.75. It is possible 
that the first four reflections (1.44, 2.04, 2.40 and 2.86) correspond to ringwoodite, 
a polymorph of olivine stable in the lower half of the mantle transition zone. The 
relatively large measured olivine unit cell certainly suggests Mg-poor compositions 
(Mg/(Mg-+Fe) < 0.80). Accounting for even a modest remnant pressure of 
0.5-0.6 GPa**?? would imply an olivine composition of Fo73 (Mg/(Mg-+Fe) = 0.73), 
which is far from typical olivine-inclusion compositions, but overlaps with the 
composition reported for a ringwoodite inclusion with Mg/(Mg + Fe) = 0.75951 
(ref. 29). It is therefore considered possible that this sample may contain both 
olivine from inverted ringwoodite, as well as preserved ringwoodite. Again, it 
should be noted that the healed cracks raise the possibility that some inclusion 
material could have been introduced post-growth, meaning that ferropericlase 
may not necessarily have been in equilibrium with other phases, nyerereite and 
olivine. For this reason, and given that ferropericlase was observed in other 
samples, only ferropericlase is discussed and considered as a primary mineral in 
the main text. 
Raman- and X-ray-diffraction-based inclusion barometry. Within diamonds, 
the pressure inside an inclusion is often elevated. Although the extreme confining 
pressure of the deep mantle is removed as the diamond is carried to the surface, 
the inclusion remains confined within the diamond host and may still have some 
remnant pressure. The expected remnant pressure is primarily dependent on 
the inclusion mineral species and depth (pressure and temperature conditions) 
of entrapment. In reality, most inclusions have at least some reduction in their 
remnant pressure due to diamond deformation (brittle/plastic) relieving some 
built-up pressure. Determining the remnant pressure in an inclusion can give a 
physical indication of the minimum depth of entrapment, constraining the depth 
of diamond growth”®?*?332,34-40, 
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Raman spectroscopy is based on the energy associated with inducing molec- 
ular vibrations, which in some instances can be sensitive to stresses acting upon 
the material. This means that the Raman spectrum of some materials exhibits 
measurable, predictable changes under pressure. The Raman peaks of minerals, 
for example, can be slightly shifted from their expected values when the minerals 
are under pressure. 

In the suite of type IIb diamonds, two inclusions were found to preserve extreme 
remnant pressure. The first inclusion is a CaSiO3 walstromite inclusion, which 
although it is interpreted to be an inversion product from Ca-Py, still preserves 
remnant pressure that can help retrace the final stages of exhumation from the 
mantle. The inclusion, in sample 110203744064, has its 977.3 cm! peak (uncon- 
fined position) shifted upwards to 1,000.4 + 0.5 cm~! (Extended Data Fig. 6), 
which corresponds to a remnant pressure of 4.4+ 0.1 GPa, using the mineral- 
and peak-specific experimental calibration factor?” of 5.16+0.9 cm~! GPa}. 
Assuming any reasonable mantle temperature within 1,200-2,000 K, this rem- 
nant pressure would have required a minimum inclusion entrapment pressure of 
~9 GPa (approximately 240-280 km deep)”. 

Similarly, in sample 890000180198, the main Raman peak of coesite is shifted 
from its unconfined value of 520.6 cm~! to 537.9+0.5 cm7! (Extended Data 
Fig. 6). Using the pressure calibration factor"! of 2.9 cm~! GPa“! gives a nominal 
remnant pressure of 6.0 + 0.4 GPa, but this is not considered a reliable pressure 
determination. Other peaks in the spectrum do not exhibit the same magnitude 
of pressure-induced shift, which suggests anisotropic stresses. Caution is needed 
in the interpretation of remnant pressure in coesite owing to the complex effects 
of anisotropy, anomalies in its elastic behaviour, plus the lack of reliable high- 
pressure and low-to-high-temperature experiments for coesite. A non-cubic mineral 
trapped in a cubic (almost elastically isotropic) host such as diamond will be 
subject to anisotropic strains during exhumation from the mantle and will develop 
deviatoric stresses. Currently there is no reliable method among existing analytical 
techniques and hydrostatic calibrations to determine strains on minerals subject 
to deviatoric stresses and interpret them in terms of pressure. X-ray diffraction 
could be a useful method to supplement Raman spectroscopy for this purpose, but 
unfortunately it cannot be performed on small inclusions to the required level of 
precision and accuracy. These factors limit the interpretation of the Raman results 
from this inclusion. A minimum entrapment pressure is therefore not derived 
here. Nevertheless, the remnant pressure suggested by the main peak of coesite far 
exceeds those of other coesite inclusions found in lithospheric diamonds, which 
reach upwards of 3.6-4.3 GPa’, 

In addition to Raman spectroscopy, X-ray diffraction revealed high remnant 
pressure inside ferropericlase inclusions in one of the studied type IIb diamonds, 
sample 110208425476. X-ray diffraction permitted the calculation of the unit cell 
(74.5(3) A), which, when combined with the composition obtained later by an 
electron microprobe (Mg/(Mg+Fe) = 0.91), gives a remnant pressure of 1.8 GPa. 
At mantle temperatures of 1,200-2,000 K, this is estimated to require entrapment 
at pressures of at least 10.3-14.1 GPa”, not accounting for the stress relieved by 
brittle and plastic strains. 
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For all three inclusions, CaSiO3 walstromite, coesite and ferropericlase, the 

estimated entrapment conditions are considered as minimum pressures. Much of 
the inclusion pressure has been relieved by brittle and plastic deformation of the 
surrounding diamond. Taken together, these results require that the diamonds 
originated from below the lithosphere. 
Data availability. All relevant data are presented in Extended Data Figs. 1-7, 
Extended Data Table 1 and Supplementary Information Tables 1-3. Original 
spectral data and electron microprobe data are available from the corresponding 
author. 
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110208245246 880000597423 101024478345 890000180201 110206933429 880001287015 890000508981 


2.76 mm long 3.07 mm long 7.38 mm long 2.86 mm max diam. 7.29 mm long 6.64 mm long 5.08 mm long 
ch ®, 

SUR 

mS 

ERS 

Va od 

110208425476 100521978876 890000180198 110208086814 110205945970 880002502964 880000037816 
1.95 mm long 4.77 mm long 2.42 mm max diam. 3.22 mm max diam. 4.23 mm max diam. 3.97 mm max diam. 4.76 mm long 


110208122379 110108200990 110207972434 890000188195 110203744064 890000562519 110208093607 
12.92 mm wide 12.58mm long 14.88 mm long 9.86 mm long 11.15 mm long 18.46 mm long 19.99 mm long 


110206861587 100717827346 110208258915 890000700302 110207965506 110207872703 110208241567 
5.54 mm wide 5.46 mm max diam. 7.66 mm long 7.73 mm long 4.28 mm long 16.69 mm long 9.03 mm long 


110208768050 110208948788 110208247369 110208579703 110208423120 DVBT 
5.24 mm long 6.08 mm long 11.03 mm long 10.76 mm long 13.83 mm long 2.78 mm wide 


110208084323 110208780369 110208773706 110208135763 890000076656 110207974941 
9.82 mm long 14.50 mm long 4.21 mm max diam. 9.85 mm long 5.00 mm max diam. 11.67 mm field of view 


110207974942 110207974945 110207974947 110207974948 110207974949 110207974950 
11.67 mm field of view 11.67 mm field of view 11.67 mm field of view 11.67 mm field of view 11.67 mm field of view 11.67 mm field of view 


Extended Data Fig. 1 | Suite of 46 type IIb diamonds studied. Images are not to scale. Refer to noted dimensions (max diam., maximum diameter). 
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Extended Data Fig. 2 | Mineralogy of mantle rocks with peridotiticand between the left and right panel is for illustrative purposes, and in reality 
basaltic bulk composition as a function of depth. Numbers in boxes some samples (for example, with Ca-Pv alone) are not firmly categorized. 
denote the number of diamonds observed with inclusions of the given See Extended Data Table 1 for a breakdown of inclusions by sample. 
phase, and blue shading in the boxes indicates that a thin fluid CH4 + H Adapted from ref. 17. 

jacket was found with the phase. We note that the division of samples 
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Extended Data Fig. 3 | Multiphase inclusion interpreted as former 
low-Ca, high-Na majoritic garnet. a, Optical microscope image of 

the inclusion, exposed on a polished facet of sample 880000037816. 

b, Secondary-electron image of the same inclusion, grooved with nearly 
horizontal polishing lines. c, d, EDS spectra of the two phases, consistent 


Electron Image 1 


Spectrum 2 


Spectrum 1 
+ 


NaAl-pyroxene 


cS 3 


Jeffbenite 


NaAl-pyroxene 


keV 


with their Raman identification as jeffbenite and NaAl-pyroxene 
(monoclinic, with composition between enstatite and jadeite). High Na 
content suggests a metabasaltic paragenesis, while low Ca content may 
reflect Ca partitioning into coexisting Ca-Pv at the base of the mantle 
transition zone or the uppermost lower mantle. 
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a 
500 pm 
c 
200 um 
Extended Data Fig. 4 | Diamond sample 110208425476. a, Optical is a small amount of iron inadvertently deposited on the surface during 
microscope image of the whole diamond, showing multiple dark polishing on a conventional cast iron scaife. c, Cathodoluminescence 
inclusions of ferropericlase. b, Polishing down the table facet slightly image of the whole diamond, revealing a complex dislocation network 
exposed this group of four ferropericlase inclusions, shown here in an pattern, with interspersed healed fractures, that records a combination of 
electron backscatter image. The smeared texture on the largest inclusion both plastic and brittle deformation. 
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Fe-carbide 


Extended Data Fig. 5 | Multiphase Fe-S-C-O metallic inclusion in 
sample 110208245246 (inclusion B). a, Optical microscope image of 
inclusion B, while still contained within the diamond host. b, Electron 
backscatter image of the inclusion, after polishing to expose a cross- 
section through it. The three main phases are colour-coded in the right 
panel. c, X-ray spectra obtained with EDS, showing the qualitative 
elemental composition of each of the three phases. Carbon is present in 


Fe-oxide 


Fe-carbide 


Fe-sulfide 10 um 


Fe-sulfide : Fe-oxide 


all spectra owing to the diamond host, diamond particles embedded in 
the polished surface (black specks, especially in the sulphide phase), as 
well as the carbon inherent to the Fe-carbide phase. d, X-ray elemental 
maps obtained with EDS, showing the spatial variation in signal in the 
region of Fe, S and O peaks (Ka1). Sulphur delineates the Fe-sulphide 
phase. Oxygen marks the Fe-oxide phase, while also showing the variable 
oxidation/tarnish layer on the sulphide portion of the inclusion. 
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CaSiO, walstromite 
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Extended Data Fig. 6 | Two inclusions exhibiting a large pressure- crack, along with other related ‘satellite’ inclusions that presumably 
induced shift in Raman features. a, CaSiO; walstromite (thought to surrounded a nucleus coesite inclusion that was polished away when this 
be former Ca-Pv) in sample 110203744064, inclusion A, with the three diamond was facetted. Neighbouring coesite inclusions in b also have 
main peaks shifted to higher wavenumbers compared to a zero-pressure high, but variable, remnant pressures, as reflected by the Raman spectra. 
reference spectrum. This inclusion also contains CHy. b, Coesite (SiO», Reference spectra are from ref. 19 and RRUFF-X050094, and zero-pressure 
thought to be former stishovite) in sample 890000180198. The inclusion reference peak positions are taken from refs”**!. 


analysed (circled) is about 2 1m wide and lies in a planar lobate healed 
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Extended Data Fig. 7 | Dislocation network pattern in sample cathodoluminescent boundaries surround darker, low-strain domains. The 
110208245246, as seen in panchromatic cathodoluminescence. Each dark curved feature on the right of the centre is a crack (not healed). 
of the bright web-like lines are made up of many dislocations, and these 
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Extended Data Table 1 | Suite of 46 type IIb diamond samples and inclusion summary 


Sample Mass Observed single- or multi-phase inclusions, interpreted as: [Ca-Pv, Ca-silicate perovskite] 

(carats)  *[Bridgmanite, Mg-silicate perovskite] [Stishovite] [Ferropericlase] 

[sodic majoritic garnet] t[Bridgmanite, aluminum bearing] [ (Mineral abbrev. in footnote) 

Basaltic compositional association, containing CF, Stishovite, or sodic majoritic garnet 
110208245246 0.08 __[Wall, [Opx + Jeffbenite + Spinel + Ilmenite], (Nepheline=/Olivine =Spinel=ICHi], [FeS + Fe-carbide + FeO] 
880000597423 0.13 [Pyroxene + Spinel + Corundum + Olivine], [Coesite + trace Px + CHa] 
101024478345 0.57 [Cpx + Opx + Jeffbenite], [Coesite], t[possible sulfide], §[small unidentified opaque], §[Calcite, in healed cracks] 
890000180201 0.08 [Pyroxene + Jeffbenite], [Coesite + Kyanite] 
110206933429 1.01 [Wal + Larnite], [Cpx + Jeffbenite + Spinel], [Coesite], t[possible sulfide], {[Perovskite, may be CaSiOz or CaTiOs] 
100521978876 0.13 [Wal + Larnite + Wollastonite], [Coesite] 
890000180198 0.05 [Wal + Larnite], [Coesite] 
880002502964 0.24 [Cpx + Jeffbenite], [Coesite], [small unidentified opaque] 
880000037816 0.21 [Wal], [NaAl-pyroxene + Jeffbenite; examined with SEM-EDS], [Opx], ¢[possible sulfide] 


May associate with basaltic or peridotitic bulk compositions, not firmly categorized 

110205945970 0.27 [Wal + Larnite], [Opx + Jeffbenite ], ¢[possible sulfide] 

110208086814 0.13 [Wal + Wollastonite], [Pyroxene + Jeffbenite + Spinel + Ilmenite + Olivine], [Opx + Spinel], [possible sulfide] 

100717827346 0.61 [Cpx + Opx + Spinel + Olivine] Raman clearly shows two spectrally/spatially distinct pyroxenes 

110208579703 2.01 [Wal + Larnite + CHa], [Opx], [Unidentified opaque, magnetic + CH4], §[lobe of small unidentified inclusions + CHa, 
magnetic, suspected metallic alloy] 

890000188195 3.46 [Wal + Larnite + CHa] 

890000700302 0.61 [Wal + Larnite + CH4 + Ha], §[lobe of small unidentified inclusions + CH4, magnetic, suspected metallic alloy] 

110203744064 2.70 [Wal + CH] 

110208093607 24.18 [Wal + Larnite] Cullinan Dream diamond, 122.52 carat rough mass 

890000562519 17.09 [Wal + Larnite] 

110108200990 3.81 [Wal + Larnite] 

110208258915 2.26 [Wal + Larnite] 

110207965506 0.46 [Wal + Larnite] 

110207872703 2.46 [Wal + Larnite] 

110208241567 2.15 [Wal + Larnite] 

110206861587 0.73 [Wal + Larnite] 

110208948788 0.32 [Wal + Larnite] 

110208247369 2.08 [Wal] 

110207972434 10.67 [Wal] 

DVBT 0.03 [Wal], §[small Fe-Ni-S inclusions, examined with SEM-EDS, in healed lobate crack] 

110208423120 4.06 [Wal + Pseudowollastonite] 

880001287015 0.35 [Wal + Larnite + Pseudowollastonite], [Unidentified opaque, magnetic + CHa + Hz] 

890000508981 0.30 [Opx + Jeffbenite] 

110208122379 6.08 [Unidentified opaque, 655 band like wistite] 

110208768050 0.42 F[possible sulfide] 

110208084323 1.02 F[possible sulfide] 

110208135763 1.71 [graphitic fracture rosette hides inclusion; suspected Wal or sulfide based on appearance] 

110207974950 2.97 [graphitic fracture rosette hides inclusion; suspected Wal or sulfide based on appearance] 


Peridotitic compositional association, with fPer and/or Opx having distinctly sharp enstatite Raman spectrum and no Al-phases 
110208773706 0.32 [Wal + Larnite], [Opx + CHa + Hz] 
110207974945 0.78 [Wal], [Opx + Olivine + CH4 + Hz, plus weak 253+376 may be lepidocrocite y-FeOOH], [fPer], 
[fPer + unidentified opaque + CHa], §[lobe of small unidentified inclusions + CHa] 
110207974941 0.92 [Opx + CHa], [Unidentified opaque, magnetic + CH4 + H2, suspected metallic alloy] 
890000076656 0.48 [Wal + Larnite], [Opx + Olivine + CHa] 
110207974949 2.80 [Wal + Larnite], [fPer] 
110207974948 2.17 [Wal + Larnite], [fPer], 
§[lobe of small unidentified inclusions + CH4, plus weak 253+376 may be lepidocrocite y-FeOOH] 
110208780369 5.02 [Wal + Pseudowollastonite + CH], [fPer], [fPer + minor sulfate or phosphate, and sulfide phases] 
110207974947 1.45 [fPer + Unidentified opaque], [graphitic fracture, inclusion nucleus not visible] 
110207974942 2.31 [fPer + Unidentified opaque], §[Dolomite, in healed cracks] 
110208425476 0.03 §[fPer, irregularly shaped inclusions, pervaded by healed fractures; XRD shows fPer + Olivine + nyerereite] 


Wal, CaSiO3 walstromite; Cpx, clinopyroxene; Opx, orthopyroxene (resolvable doublet in the 660-680 cm~' region); fPer, ferropericlase. Pyroxene not specified as clino- or ortho-pyroxene was not 
confidently subcategorized from its asymmetric main peak at 660-690 cm~1. The identified spinel has Raman spectral features like those of MgAl2Oq and FeAl20q*°, a prominent peak at ~750 cm~!, 
and other peaks matching spinel reported in other retrogressed CF-phase and Al-bridgmanite inclusions!“. The possible sulphide phase has a Raman spectrum resembling arsenopyrite and other 

As or Sb sulphides, with a Raman spectrum having weak, sharp, variable peaks in the region 100-350 cm~!. Pseudowollastonite is a high-temperature CaSiO3 polymorph, found here as an accessory 
within three calcium silicate inclusions (main peaks: 985, 580, 374 and 142 cm ys 

*Bridgmanite identification based on the presence of orthopyroxene, with sharp Raman peaks matching enstatite (Mg2Si20¢) with a clearly resolved doublet in the 660-680 cm~! region. The absence 
of jeffbenite/spinel suggests low-Al bridgmanite. 

tAluminous bridgmanite is the likely precursor for these multi-phase assemblages. However, some of these inclusions may actually represent original majoritic garnet if the pyroxene phase were to 
be identified as NaAl-pyroxene, which can be difficult to resolve in some cases. It should be stressed that even with chemical analysis, interpretation of these multi-phase retrogressed inclusions is not 
straightforward, and the significance of jeffbenite in retrogressed assemblages remains a matter of debate!”. 

Small (<10 jm, often <5 jum) and rare inclusion, regarded as a minor accessory inclusion. 

§Inclusions in healed cracks, of mantle origin but trapped or modified post-growth. 
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Hurricanes are catastrophically destructive. Beyond their toll 
on human life and livelihoods, hurricanes have tremendous 
and often long-lasting effects on ecological systems}. Despite 
many examples of mass mortality events following hurricanes>°, 
hurricane-induced natural selection has not previously been 
demonstrated. Immediately after we finished a survey of Anolis 
scriptus—a common, small-bodied lizard found throughout the 
Turks and Caicos archipelago—our study populations were battered 
by Hurricanes Irma and Maria. Shortly thereafter, we revisited the 
populations to determine whether morphological traits related to 
clinging capacity had shifted in the intervening six weeks and found 
that populations of surviving lizards differed in body size, relative 
limb length and toepad size from those present before the storm. 
Our serendipitous study, which to our knowledge is the first to use 
an immediately before and after comparison‘ to investigate selection 
caused by hurricanes, demonstrates that hurricanes can induce 
phenotypic change in a population and strongly implicates natural 
selection as the cause. In the decades ahead, as extreme climate 
events are predicted to become more intense and prevalent”*, our 
understanding of evolutionary dynamics needs to incorporate the 
effects of these potentially severe selective episodes”"1. 

Hurricanes bring death and destruction to the ecosystems in their 
path. Their effects are myriad—decimating populations*, reshuf- 
fling plant and animal communities’?>!”, and fundamentally altering 
ecosystem cycles!3, Long-term demography studies have provided 
numerous examples of substantial mortality events in plants’, 
sponges!>, land snails‘, stick insects‘, lizards!®, birds® and monkeys? 
due to hurricanes. It remains an open question, however, whether hur- 
ricanes cause selective mortality on the basis of phenotypic traits or are 
instead so indiscriminately destructive that survival is random with 
regard to phenotype. 

The fall of 2017 was an extraordinary season for Atlantic 
storms with three record-breaking events: Hurricanes Harvey, Irma 
and Maria. Four days before Hurricane Irma made landfall in the 
Turks and Caicos Islands, we finished surveying the morphology of 
A. scriptus on two small islands (Pine Cay and Water Cay) in the archi- 
pelago (Fig. 1). Hurricane Irma hit the islands with sustained 265-kph 
winds and, two weeks later, Hurricane Maria followed with sustained 
200-kph winds. 

Six weeks after our initial survey and three weeks after Hurricane 
Maria, we returned to Pine Cay and Water Cay, resurveyed the same 
transects on both islands, caught a sample of the surviving lizards and 
measured their morphology (Methods). Decades of previous research 
on Anolis species have demonstrated that the size of their toepads and 
the length of their limbs are related to habitat use and locomotor per- 
formance. We therefore tested whether the mean toepad surface area'” 
and limb lengths’ of the surviving lizards were larger than those of the 
lizards in the populations that we initially surveyed, as these increases 
are predicted to improve clinging ability'”"®. 


We found parallel shifts in limb and toepad morphology 
between the pre- and post-hurricane populations on both islands 
(see Supplementary Information for complete results and model out- 
put). A multivariate analysis of covariance on all limb components 
revealed that the morphology of the post-hurricane lizard popula- 
tions differed significantly from the pre-hurricane lizard populations 
(Fi 46 = 18.278, P< 0.0001), and that the post-hurricane shifts were 
parallel on both islands—there was no hurricane x island interaction; 
Fy45= 1.377, P=0.1833. The multivariate analysis of covariance struc- 
ture coefficients indicate relative femur length most significantly distin- 
guished the pre- and post-hurricane populations (see Supplementary 
Information). 

Post hoc analyses examining traits individually indicate that the sur- 
viving populations of both islands had proportionately longer humeri 
(relative humerus length: G+s.e.: 0.03 + 0.008; t159 = 3.64; P=0.0004; 
Fig. 2)—the average humerus length increased by 1.8%, despite a 
significant 1.4% decrease in mean body size (snout-to-vent length 
(SVL): G+s.e.: 1.20 + 0.484; ti59 = —2.483; P=0.0141). By contrast, we 
observed a significant 6% decrease in relative femur length across both 
populations after the hurricane (G+s.e.: —0.05 +0.009; ty59 = —5.92; 
P<0.0001; Fig. 2) and a decrease of 4.6% in the length of the longest 
toe (relative longest toe length: G+s.e.: —0.03 + 0.009; tis9 = —3.75; 
P=0.0002; Supplementary Information). 

On average, the post-hurricane lizards of both islands had larger 
toepads on both their forelimbs and hindlimbs (relative forelimb 
toepad area: G+s.e.: 0.13 £0.017; tis3 = 7.93; P < 0.0001; relative hind- 
limb toepad area: G+s.e.: 0.10 + 0.015; ti5g = 6.819; P< 0.0001; Fig. 2), 
which correspond to increases in population means of 9.2% and 6.1% 
for forelimb and hindlimb toepad area, respectively. We also found a 
parallel sex x hurricane interaction in body size (G+s.e.: 2.87 + 0.949; 
t159 = 3.027; P=0.003)—on average, male SVL decreased by 4.3% 
whereas female SVL increased by 0.9%. 

Despite the overall trend for parallel shifts, which was evident in both 
the multivariate analysis of covariance and many of the morphological 
traits, we found that forelimb toe length showed a differing response 
between the two islands (hurricane x island interaction: G+s.e.: 
—0.05 + 0.025; tis9 = —2.11; P=0.036), decreasing by 3.9% on Pine 
Cay and increasing by 2.2% on Water Cay. In addition to the parallel 
and non-parallel changes detected for many traits, we did not detect a 
difference between pre- and post-hurricane populations in the length 
of any other segments of the limbs, nor in the number of lamellar scales 
on the forelimb or hindlimb toepads (Supplementary Information). 

We next considered what might be responsible for the parallel shifts 
in phenotypes in the two populations. Two lines of evidence suggest 
that natural selection favoured individuals able to survive the hurri- 
canes. First, if the hurricanes caused directional selection, then the 
survivors should exhibit reduced trait variation after the hurricanes®. 
We tested this prediction by calculating the variance in each of the 
measurements that showed a significant shift in their mean after the 
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North 
Caribbean 


f se na 
Fig. 1 | Location of Pine Cay and Water Cay with respect to Hurricanes 
Irma and Maria. Pine Cay and Water Cay are located in the Turks and 
Caicos Islands in the West Indies. They are home to the endemic Turks 
and Caicos anole, Anolis scriptus. On 8 September 2017, Hurricane Irma 


hurricanes. The variance in all six of these traits decreased among the 
surviving A. scriptus on Pine Cay, and decreased in four of the six on 
Water Cay (Supplementary Information), a result that is unlikely to 
have occurred by chance (P=0.019 using the binomial test; analyses 
on principal component axes gave similar results, see Supplementary 
Information). 

Second, survivors had traits associated with greater clinging ability. 
The positive relationship between toepad size and clinging capacity 
among anoles is well-established”; we confirmed this relationship 
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directly hit Turks and Caicos (black circle), shown in the water vapour 
satellite maps (from NOAA, www.goes.noaa.gov). Two weeks later, on 22 
September 2017, Hurricane Maria struck Turks and Caicos. Map data: 
Google, © 2018 DigitalGlobe. 


for A. scriptus (G+s.e.: 0.031 £0.01; tg6 = 3.26; P=0.0016; Methods). 
The larger toepads of surviving lizards support the hypothesis that 
natural selection favoured individuals with greater clinging capacity, 
which were able to withstand high winds. The observed shifts in limb 
length may have a similar functional explanation. Previous work has 
demonstrated that more force is needed to pull long-limbed anoles off 
aperch!®, The longer forelimbs of surviving A. scriptus may have been 
beneficial during the hurricanes for this reason. However, the parallel 
decrease in femur and hind toe length was contrary to our predictions. 
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Fig. 2 | Parallel shifts in body-size-corrected humerus and femur length 
and surface area of the forelimb and hindlimb toepads for lizards on 
both Pine Cay and Water Cay. Dashed lines are the linear best fit of lizards 


Snout-to-vent length (mm) 


measured before the hurricane (n =71), which are represented by open circles. 
Filled circles are lizards from after the hurricane (n = 93) with solid lines of 
best fit. The grey-shaded areas correspond to 95% confidence intervals. 
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Table 1 | Variance-standardized selection differentials 


Pine Cay Water Cay 
Snout-to-vent length —0.17 —0.05 
Humerus 0.40 0.8 
Femur —0.67 —0.89 
Hindlimb longest toe —0.67 —0.39 
Forelimb toepad 1.18 1.04 
Hindlimb toepad 0.88 1.08 


See Methods for details. Variance-standardized selection differentials for humerus length, 
femur length, hindlimb longest toe length and toepad surface areas were calculated using 
body-size-corrected residuals. 


Preliminary experimental trials (Supplementary Information) suggest 
one potential explanation: because of the posture A. scriptus adopts 
when exposed to high winds, longer hindlimbs could present a larger 
exposed surface area—which would increase the likelihood that a lizard 
would be blown off its perch. 

Strong directional selection that favoured individuals able to hold 
tight during hurricane-force winds is a likely explanation for the paral- 
lel shifts in phenotype we observed. Of course, other explanations are 
possible: the hurricanes wrought environmental changes that affected 
vegetation structure, thermal microclimates and—most probably— 
food availability (though if lizards were starving, body condition 
would decrease, which on average it did not; G+s.e.: 0.017 £ 0.019; 
ti61 = —0.867; P= 0.3874; Supplementary Information). Although we 
cannot rule out these potential selective pressures, the fact that traits 
associated with clinging ability changed in ways that should increase 
survival in high winds lends credence to our hypothesis that the 
storms, and not their aftermath, drove these changes. Whatever the 
selective force was, selection must have been strong. For context, we 
calculated selection differentials for body size and the size-corrected 
morphological components that demonstrated a significant shift in 
their mean (Table 1), and found that they exceeded the majority of 
published selection differentials”’-*” and were of comparable magni- 
tude to the selection experienced by Darwin's finches in two famously 
harsh periods”?. 

Evolutionary processes other than natural selection could con- 
ceivably have produced these patterns, though we consider them 
implausible. We cannot rule out that lizard dispersal from elsewhere— 
unsampled microhabitats, other parts of the islands or other islands 
altogether—caused the observed phenotypic shifts. However, both 
Pine Cay and Water Cay are small islands with homogenous vegetation 
structure and all available microhabitats were thoroughly sampled. The 
appearance of morphologically different migrants would have increased 
rather than decreased trait variation in the recipient populations and 
would have had to occur in parallel on both islands (see Methods for 
additional sampling descriptions). We therefore discount this alterna- 
tive explanation. 

Attention to the evolutionary importance of phenotypic plasticity 
has increased in recent years, and adaptive phenotypic plasticity is 
more prevalent than previously realized**. We considered whether 
hurricane-induced phenotypic plasticity, either during the storms or 
as a result of post-hurricane conditions, could be responsible for the 
phenotypic shifts that we documented. In lizards, adult limb length can 
be affected by perch use during ontogeny”*”®, and bone shrinkage has 
been documented in starving marine iguanas”. By contrast, plasticity 
in toepad size has not been reported”®, All of these studies, however, 
documented responses that occurred over long exposure periods or 
during ontogeny; our comparison spanned six weeks and was restricted 
to adults. Lastly, plastic decreases in skeletal elements have only previ- 
ously been observed in response to food stress”; if food stress were 
the cause here—and our body condition data suggest this was not the 
case—it would have decreased the length of all skeletal elements. For 
these reasons, we consider phenotypic plasticity to be an unlikely expla- 
nation for the observed phenotypic shifts. 
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The role of extreme climate events in driving evolution is of 
pressing interest, and although some examples due to prolonged heat, 
cold or drought have been documented (reviewed in Grant et al.!), 
additional studies are needed in our era of rapidly changing and inten- 
sifying climate extremes”. Previous studies”'!*° have only alluded 
to the evolutionary ramifications of hurricanes. A 20-year study of a 
smaller lizard species, Anolis sagrei, on much smaller islands in the 
Bahamas—which are more vulnerable to hurricane-induced storm 
surges’® than the islands in this study—showed that after the popula- 
tions were hit by hurricanes, in the next sampling period mean tibia 
length had increased*”. Whether these shifts were the result of natural 
selection and, if so, whether the selection had occurred during the 
hurricane itself or in the subsequent 8-48 months before the population 
was resampled remains unknown. 

The long-term evolutionary consequences of these hurricanes 
on A. scriptus remain to be seen. Despite the extensive work on 
Anolis lizards, the biology of A. scriptus is little known. Indeed, our 
initial survey was conducted to gather baseline data on the natural 
history of the species in anticipation of a conservation project. That 
survey provided a serendipitous baseline from which to measure 
this selection event; future work is needed to determine whether the 
within-generational selection that we documented translates into 
evolutionary change across generations. 

The macroevolutionary importance of infrequent but severe selective 
events such as these is an open question!'. On the one hand, if such 
events are rare then natural selection in intervening periods might be 
expected to erase the signature of infrequent bouts of extreme selection. 
On the other hand, if hurricanes occur frequently enough—or if the 
selection is strong enough—then present-day populations may bear 
the mark of such events, in which case selection during normal years 
would not be able to fully explain current phenotypic distributions. In 
this light, it is notable that Anolis species occupying Caribbean islands 
have substantially larger toepads relative to their body size and habitat 
use than do congeners on mainland Central and South America®’. Why 
this difference exists has long been questioned, but our findings suggest 
the possibility that hurricane-induced selection, a much more common 
occurrence on islands than on the mainland, may be responsible for 
this macroevolutionary pattern. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0352-3. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Pine Cay and Water Cay are islands in the Turks and Caicos archipelago. Pine 
Cay, the larger of the two, is approximately 350 ha, sparsely inhabited by approx- 
imately 20 private residences, and covered with low vegetation (1-3 m in height). 
Water Cay (250 ha) was a separate island until the mid-1990s, when Water and 
Pine Cays were connected by a sand bridge deposited by a hurricane storm surge. 
Although some patches of vegetation grow between the islands, the narrow sand 
bar probably strongly inhibits the movement of lizards between the two islands; 
no lizards were observed on our trips crossing it. 

A. scriptus, the Turks and Caicos anole, is a small (adult SVL 40-70 mm) and 
largely insectivorous lizard. It is abundant on Pine and Water Cays, readily detect- 
able and typically found perched on vegetation at or below 3 m in height”. Our 
initial pre-hurricane survey of the anoles of these cays for an ongoing conservation 
project started on 28 August 2017 and lasted until 4 September 2017. As we did not 
anticipate investigating hurricane-induced selection, none of the captured lizards 
were permanently marked. Hurricane Irma hit Turks and Caicos on 8 September 
2017, followed by Hurricane Maria on 22 September 2017. Our post-hurricane visit 
spanned from 16 October 2017 to 20 October 2017. To sample the lizard popula- 
tions, C.M.D., A.-C.E and A.H. walked a transect approximately 2-km long on each 
island, catching lizards whenever sighted with a noose and pole. These transect 
paths were repeated by C.M.D. and A.H. in the post-hurricane revisit. In total, we 
caught 71 adult lizards in the initial survey and 93 during the revisit. The increased 
number of lizards in the second visit reflects an increase in sampling time; care was 
taken to sample in all of the same microhabitats before and after the hurricane. 

Both Pine Cay and Water Cay are small islands with homogeneous vegetation 
structure; consequently, microgeographic morphological variation within each 
island is likely to be minimal and we attempted to survey what variation does 
exist within our 2-km-long transects. We therefore think it unlikely that dispersal 
of individuals from elsewhere on the island into our sampling area would shift the 
population mean. Further, if migrants were morphologically different from the 
initial study population, it is unlikely that migration from outside the study site 
would shift phenotypic distributions in the same way on both islands. Moreover, if 
migrants arrived from one or more phenotypically distinct populations, the result 
would be more likely to increase than decrease the variation within the recipient 
population, in contrast to our findings. This last point also applies to the possibility 
that lizards were blown in from another island (the closest point given the direction 
of storm being North Caicos Island, 7 km away), an explanation that we consider 
to be even less likely. 

An additional consideration inextricably linked to capturing samples of animal 
populations is that during our pre-hurricane surveys we failed to detect morpho- 
logically distinct lizards occupying unique microhabitats (for example, tree cano- 
pies). If those lizards became more apparent after the hurricanes, they could change 
the sample trait means, and spuriously suggest population-wide trait shifts. This 
possibility seems unlikely for several reasons: we thoroughly sampled all available 
microhabitats along the capture transects; A. scriptus individuals are rarely found 
above 3 m™ and spend the majority of their time within 1.5 m of the ground where 
they are particularly easy to spot. More generally, previous studies have found no 
evidence to indicate that, within anole populations, individuals that use different 
microhabitats differ in limb or toepad characteristics***“. The proportion of indi- 
viduals with extreme values for the morphological traits in the post-hurricane 
samples suggests that mortality was probably very high; however, because we did 
not estimate population size or individually mark lizards, change in population 
size could not be estimated. 

During both surveys, A.H. measured the morphology (snout-to-vent length, 
and length of the humerus, radius, metacarpal, longest forelimb toe, femur, tibia, 
metatarsal and longest hindlimb toe**) of each individual using digital calipers 


(Mitutoyo 500-752). In addition, a photograph was taken of the right fore and hind 
feet of each lizard, unless a digit was missing—in which case the left was photo- 
graphed. All photographs were captured with an iPhone 7 using a Moment Macro 
Lens attachment. Using ImageJ (v.1.51a., W. Rasband, National Institute of Health, 
Bethesda), C.M.D. measured the toepad area of the longest toe three times for each 
forelimb and hindlimb. The repeated measurements were highly consistent and the 
average was used for analyses (intraclass correlation coefficient for forelimb toe- 
pad area was 98.9% (95% confidence interval (CI): 98.6-99.2%) and for hindlimb 
toepad area was 99.6% (95% CI: 99.5-99.7%). C.M.D. also counted the number 
of lamellae on the toepad of the longest toe on the right forelimb and hindlimb. 

Lizard clinging performance measurements were taken using a Vernier Dual 

Range Force Sensor DFS-BTA with an acetate transparency as a gripping surface!” 
Following standard protocols”, the right forelimb of each lizard was pulled down 
the surface by C.M.D. and the maximum force exerted by the forelimb toepads was 
recorded and analysed by A.-C.F. Methods for the wind speed behavioural study 
are presented in the Supplementary Information. All procedures were approved by 
Harvard IACUC (26-11) and all lizards were released unharmed after the experi- 
ment to their point of capture. 
Analyses. All analyses were conducted in R** on adult lizards (female SVL 
>40 mm, male SVL >45 mm). We used a multivariate analysis of covariance to 
evaluate whether limb morphology varied between lizards on both islands before 
and after the hurricane. To do so, we log-transformed all morphometric measure- 
ments and used sex as a fixed effect and body size (SVL) as a covariate. The fixed 
effects of interest were island of origin (Water Cay or Pine Cay), hurricane treat- 
ment (pre-/post-) and their interaction. We did not detect a three-way interaction 
with sex. Following the multivariate analysis of covariance, we conducted post hoc 
general linear models on individual limb traits and SVL. An interaction of hurri- 
cane and island of origin was tested in the linear model and removed if the inter- 
action was not significant. For all models, sex was included as a fixed effect. For all 
relative morphometric analyses (that is, comparisons taking into account differ- 
ences in body size) we used SVL as a covariate. We used the ‘Ismeans’ and ‘effects’ 
packages in R to test for significance in the model estimates (see Supplementary 
Information for additional analysis details). 

To calculate selection differentials, we transformed SVL as well as the body-size- 
corrected humerus, femur and toepad areas to have a zero mean and unit variance. 
The post-hurricane population means were then subtracted from the pre-hurricane 
population means to calculate the differential”. We compared these differentials to 
other published studies””-*?, most of which, however, were calculated over longer 
periods. We calculated body condition as residual of a linear regression between 
body mass and snout-to-vent length. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 
Data availability. All morphological and performance data collected for this study 
are available on Dryad, https://doi.org/10.5061/dryad.2t41164. 
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Ecosystem restructuring along the Great Barrier 
Reef following mass coral bleaching 


Rick D. Stuart-Smith!*, Christopher J. Brown?, Daniela M. Ceccarelli? & Graham J. Edgar! 


Global warming is markedly changing diverse coral reef ecosystems 
through an increasing frequency and magnitude of mass bleaching 
events'~?. How local impacts scale up across affected regions 
depends on numerous factors, including patchiness in coral 
mortality, metabolic effects of extreme temperatures on populations 
of reef-dwelling species‘ and interactions between taxa. Here we 
use data from before and after the 2016 mass bleaching event to 
evaluate ecological changes in corals, algae, fishes and mobile 
invertebrates at 186 sites along the full latitudinal span of the Great 
Barrier Reef and western Coral Sea. One year after the bleaching 
event, reductions in live coral cover of up to 51% were observed 
on surveyed reefs that experienced extreme temperatures; however, 
regional patterns of coral mortality were patchy. Consistent declines 
in coral-feeding fishes were evident at the most heavily affected 
reefs, whereas few other short-term responses of reef fishes and 
invertebrates could be attributed directly to changes in coral cover. 
Nevertheless, substantial region-wide ecological changes occurred 
that were mostly independent of coral loss, and instead appeared to 
be linked directly to sea temperatures. Community-wide trophic 
restructuring was evident, with weakening of strong pre-existing 
latitudinal gradients in the diversity of fishes, invertebrates and their 
functional groups. In particular, fishes that scrape algae from reef 
surfaces, which are considered to be important for recovery after 
bleaching’, declined on northern reefs, whereas other herbivorous 
groups increased on southern reefs. The full impact of the 2016 
bleaching event may not be realized until dead corals erode during 
the next decade”*. However, our short-term observations suggest 
that the recovery processes, and the ultimate scale of impact, are 
affected by functional changes in communities, which in turn 
depend on the thermal affinities of local reef-associated fauna. Such 
changes will vary geographically, and may be particularly acute at 
locations where many fishes and invertebrates are close to their 
thermal distribution limits’. 

The 2016 mass bleaching event affected coral reefs world-wide, with 
catastrophic impacts reported in the Red Sea, central Indian Ocean, 
across the Pacific Ocean and in the Caribbean**?. The Australian 
Great Barrier Reef (GBR), the largest coral reef system in the world, 
experienced the warmest temperatures on record for the region. An 
estimated 91.1% of reefs along the GBR experienced some bleaching’, 
resulting in an estimated loss of approximately 30% of live coral cover 
over the following six months’. The event was thus comparable to the 
1998 mass bleaching event in the Indian Ocean in terms of reported 
impacts on corals”’!. We surveyed 186 reef sites along the GBR and 
at less-studied isolated reefs in the Coral Sea before and after the 
2016 bleaching event, and here we report reef- and regional-scale 
effects of the extreme thermal anomaly and loss of coral cover on the 
rich reef-associated fish and mobile invertebrate fauna. At each site, 
globally standardized Reef Life Survey census methods!” were used 
to quantify changes to coral cover, reef fishes and mobile macroinver- 
tebrates at multiple depths (overall mean, 6.7 m; range, 0.8-17.0 m). 


‘Before data were obtained between 2010 and 2015, and ‘after’ data 
were obtained 8-12 months after bleaching. 

As reported elsewhere!®, decreases in live hard coral cover were wide- 
spread (Fig. 1), although we found that the regional pattern was more 
spatially heterogeneous than previously described, when field surveys 
were standardized amongst shallow reef crest habitat’®. Forty-four of 
the 186 surveyed sites experienced absolute declines in live coral cover 
that exceeded 10% (up to 51% loss for one site at Osprey Reef), with the 
northern Coral Sea reefs suffering the most consistent losses (Fig. 1a, b). 
The magnitude of coral-cover change was related to the local sea 
temperature anomalies (Fig. 1d and Extended Data Fig. 1), but coral 
loss varied considerably, and not all reefs in regions that experienced 
the greatest temperature anomalies experienced losses in live coral 
cover. In some cases, such as the central Coral Sea reefs, a history of 
cyclone damage meant that there was relatively little coral to lose. Thus, 
geographical patterns in pre-bleaching cover had a critical role in the 
realized effects of bleaching on corals (Fig. 1d). Coral-cover losses of 
the greatest magnitude occurred in disparate locations, including in the 
northern Coral Sea (Boot and Osprey Reefs; mean, 15% absolute cover 
loss, or approximately 40% of the pre-bleaching live coral cover), and 
the southern GBR (most southerly Swain Reefs; 28% loss, or 100% of 
pre-bleaching cover). The northern reefs in the GBR experienced the 
most extensive bleaching of those surveyed during the 2016 event’, but 
not all of the reefs in that area suffered the extreme rates of live coral- 
cover loss that were observed more generally’? (Fig. 1a, b). The fate of 
bleached corals can vary considerably'*"*, and a reasonable proportion 
of corals on some of these reefs must have regained their zooxanthellae 
and survived the bleaching event. Algal cover substantially increased 
across the majority of reefs that experienced coral declines (Fig. 1c and 
Extended Data Fig. 2). 

Not all coral declines that were observed during the study could be 
assumed to be solely due to the bleaching event (other disturbances, 
such as cyclones, may have also had impacts on corals at particular 
locations; see Methods). To investigate the effects on reef fauna that 
could be most clearly attributable to the bleaching event, we quantified 
changes on a subset of reefs that experienced extreme heating 
and substantial live coral-cover loss (see Methods for criteria). On 
these reefs, the abundance of coral-eating fishes (corallivores) con- 
sistently declined, and declines in local fish species richness were 
also common (Extended Data Fig. 3). Such changes have previously 
been observed as rapid responses to coral bleaching events**!°, and 
are clearly a concerning form of reef-scale biodiversity loss. These 
changes were not observed on a subset of comparison reefs that also 
experienced extreme heating, but that did not experience an observ- 
able loss of live coral cover (Extended Data Fig. 3). Other previously 
reported short-term effects of bleaching, such as increased herbivore 
abundance!’ in response to a boom in algal resources”, occurred on 
some study reefs, but were not consistent features of those reefs with 
the clearest impacts on coral cover attributable to bleaching (Extended 
Data Fig. 3). 
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Fig. 1 | Observed changes in live hard coral cover from the 2016 mass 
bleaching event across the Coral Sea and GBR. a-c, Reefs in the Coral 
Sea showed relatively consistent losses of live corals (a, b) and gains in 
algal cover (c) in the north, whereas changes along the GBR were highly 
patchy. Absolute changes in live coral cover are mapped for individual 
sites (n = 186), with aggregation of sites at the reef scale shown as plusses 
(n=53). d, Coral-cover loss was related to the local heat anomaly from 


Coherent patterns of ecological change were evident when assess- 
ing regional-scale trends between survey periods across the full range 
of sites surveyed. The latitudinal gradient in local species richness of 
mobile fauna’’ declined in slope through a combination of decreased 
local fish richness on northern reefs and markedly increased richness 
of macroinvertebrates and small cryptic fishes on southern reefs (Fig. 2 
and Extended Data Fig. 4). The structure of fish communities on south- 
ern reefs became more similar to those in the north (Extended Data 
Fig. 5), a broad-scale homogenization that resulted in a slight decline 
in the overall number of fish species recorded across all surveys (from 
532 to 494). Invertebrate communities also changed considerably 
between survey periods (Fig. 2 and Extended Data Fig. 4). This was 
characterized most clearly by sea urchins being found less frequently 
on northern reefs and in increased abundance on southern reefs after 
the bleaching event. 

A key outcome of these changes was the regional alteration to the 
functional structure of reef communities, with potentially important 
consequences for the recovery of affected reefs. Functional richness 
(represented by the number of unique functional trait combinations 
comprised by fishes and invertebrates on each survey) increased on 
southern reefs, where the potential for local herbivory also increased 
through herbivorous fish biomass gains (Fig. 2 and Extended Data 
Figs. 4, 6) and patchy gains in the abundance of sea urchins. By contrast, 
the frequency of occurrence and biomass of fishes that scrape algae 
and microscopic autotrophs off coral-rock surfaces (scraping 
herbivores), and the frequency of sea urchins declined on northern reefs, 
whereas the biomass of plankton-feeding fishes increased (Figs. 2, 3 
and Extended Data Fig. 4). 

Most of these rapid, regional-scale ecological changes could not be 
linked to coral loss (Extended Data Fig. 4), and so cannot be assumed 
to be indirect effects of the bleaching event (or any other causes of coral 
degradation during the study). Some of these changes could neverthe- 
less result from changes in the local composition and community struc- 
ture of corals and algae, independently of the total amount of coral loss, 
but the spatial footprint of changes in the fishes and the invertebrates 
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January to March 2016 regardless of depth, an effect that increased in 
strength according to pre-bleaching cover of live corals. Average pre- 
bleaching cover for the region was 26% (middle), whereas low and high 
(left and right) are shown for +1 s.d. (19%) from average pre-bleaching 
live coral cover. Effects in d are from Bayesian mixed-effects models, with 
shading representing 95% credible intervals. 


suggests that at least some of these changes were independent of changes 
in habitat. The consistency of ecological change along the latitudinal 
gradient differs from the heterogeneous patterns of the changes in 
coral and algal cover, particularly along the GBR, whereas the southern 
Coral Sea reefs showed very clear ecological changes, despite largely 
escaping bleaching. The loss of large predatory fishes in remote loca- 
tions, such as in the northern GBR and on some reefs in the southern 
Coral Sea (Extended Data Fig. 6), could potentially be associated with 
expansion of the fishing footprint, but this needs further investigation. 
Changes in fishing pressure are unlikely to have resulted in most of the 
other coherent regional scale patterns of changes in the communities, 
because few herbivorous fishes, cryptic fishes and reef-dwelling inver- 
tebrates are targeted by fishers in this region. 

Another potential explanation for the rapid restructuring of commu- 
nities relates to more direct effects of region-wide anomalously warm 
temperatures and altered currents on the local occupancy patterns and 
abundance of different species!*. Marine heat waves and short-term 
temperature variations have been shown to markedly affect temperate 
rocky reef communities”, but have not been well-investigated on coral 
reefs. The sea temperatures that were experienced during the bleach- 
ing event (up to 32°C in the northern GBR!$) exceeded those at the 
warm limits of the distributions for the majority of reef fishes that were 
recorded in the region’, and many species on northern reefs probably 
experienced thermal stress. 

We used species temperature index (STI) values for fish species 
recorded during surveys to investigate the possibility that reduced species 
richness and altered trophic structure on the warmer northern reefs were 
due to disproportionate effects on species with an affinity for relatively 
cooler seas. STI values are derived from the realized thermal distribu- 
tions of species across their entire range””°, and provide a nuanced and 
continuous measure of the ocean climate on which the distribution of 
each species is centred. On average, patterns of change in the frequency 
of occurrence of species in each trophic group were positively related to 
their STI values (Fig. 3). Specifically, those species that declined between 
surveys across northern reefs tended to be corallivores and scraping 
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Fig. 2 | Changes in latitudinal trends in reef fish and invertebrate 
communities associated with the 2016 mass coral bleaching event. 

a-d, Plots show local richness (log scale). e, Plot shows sea urchin presence 
(log-odds). f-1, Plots show biomass in g per 500 m* (log scale). Latitudinal 
trends are median effect estimates from Bayesian generalized linear 
mixed-effects models (n = 233 site-by-depth-category combinations). 
Shaded regions show the marginal 95% credible intervals, and asterisks 


herbivores with distributions in relatively cooler waters; a pattern that 
was consistent for fish communities along both the GBR and Coral Sea, 
which have different biogeographical affinities”’ (Extended Data Fig. 5). 
Fishes that feed by excavating the coral-rock surface tended to have the 
warmest affinities (that is, higher STI values), and became more common 
in surveys in both the north and south (Fig. 3), although the increased 
frequency in the north did not translate to an increase in local biomass 
(Fig. 2 and Extended Data Fig. 4). 

A bias in thermal affinities of reef fishes related to their trophic group 
has not previously been investigated in detail, and the generality of this 
phenomenon is unknown. In this case, the pattern was characterized 
by high variability (Fig. 3), and becomes increasingly influenced by 
excavators at the scale of the full GBR. The opposite situation may 
occur on temperate reefs, where herbivorous fishes have warmer STIs 
than other trophic groups’. Further investigation is needed to deter- 
mine whether biases in STIs of trophic groups are idiosyncratic and 
location-dependent, or whether coherent geographical patterns emerge 
for particular trophic groups. The decreased frequency of occurrence 
of corallivores at northern reefs could also be related to coral mortality, 
with this effect potentially confounded by inferred effects of thermal 
stress (or other causes that were not investigated). 

Ecological change on southern reefs included an increasing similarity 
of fish community structure to that on northern reefs (Extended 
Data Fig. 5), which is consistent with a potential influence of warmer 
temperatures, but could also result from altered currents and possible 
enhanced fish recruitment of northern species in the south. No clear 
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indicate those metrics for which the term for the change in latitudinal 
slopes (the interaction between latitudinal and time period effects) has 
95% credible intervals that do not overlap zero (model effect sizes with 
credible intervals for all predictors are shown in Extended Data Fig. 4). 
y axes are on the link scale (log for Poisson and normal, and logit for sea 
urchins). 


signal of an influx of northern species recruits was evident, however, as 
the local richness of juveniles was no greater after the bleaching event 
than before (Extended Data Fig. 7). Instead, the majority of positive 
changes in the south related to taxa of relatively small adult body size— 
both invertebrates and cryptic fishes. These could be more sensitive to 
temperature changes and/or capable of increases in local population 
size more rapidly and/or could experience rapid numerical or behav- 
ioural release if predation pressure was reduced. Although less prob- 
able, release from predation may have resulted from minor decreases 
in the frequency of predatory fishes in the Coral Sea (Extended Data 
Fig. 6) and benthic invertebrate consumers in the GBR (Fig. 3). 

Our broad-scale field surveys did not allow a definitive test of causa- 
tion for the rapid regional ecological changes that were observed. 
Regardless of the causes, however, a critical feature is that the short- 
term effects of the bleaching may have been masked in some cases. 
For example, we observed an increase in fish species richness on a reef 
in the Swains area, despite concurrent coral devastation (albeit highly 
localized in a region that was otherwise little affected by bleaching’). 
Likewise, at the regional scale, local fish species richness increased on 
40% of the surveyed reefs, despite mass bleaching, net coral loss and an 
overall decline in regional species richness. Such trends appear remark- 
able, given that a reduction in fish species richness has been amongst 
the most consistently and rapidly observed local ecological responses 
to coral loss observed in previous studies®"®. 

The observed regional-scale reshuffling and trophic reorganiza- 
tion appear to be extremely rapid, observable less than one year after 
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Fig. 3 | Changes in the frequency of fish functional groups with 
differing temperature affinities on reef surveys following the bleaching 
event. a, Reefs north of 12° S latitude. Corallivores and scraping 
herbivores, the coolest-affinity trophic groups (lower STI values, on 
average), declined in frequency, while excavators (higher STI values, 

on average) increased in frequency on transects at northern reefs. 

n= 321 species in GBR, 301 species in Coral Sea. b, Reefs south of 19° S 
latitude. Excavators became more common on transects at southern 
reefs in the Coral Sea. n = 320 species in GBR, 305 species in Coral Sea. 
Points are means of species in each trophic group, shown separately for 
species recorded in the GBR (blue) and Coral Sea (orange). Data are 
mean + s.e.m. 


the bleaching event. Rapid changes have previously been noted, such 
as increasing densities of herbivores’*, and have been hypothesized 
to be due to redistribution on reefs” rather than to a demographic 
response’. A substantial proportion of pre-bleaching surveys were 
undertaken in 2013 (ranging from 2010 to 2015), and many of the 
observed changes could have resulted from a number of consecutive 
warm years, rather than the single 2016 bleaching event. In addition to 
the 2016 event, the study period included two of the next nine warmest 
years on record for the GBR region (http://www.bom.gov.au/climate/ 
change/; accessed September 2017). The observed patterns may thus 
in part represent accumulated responses over multiple exceptionally 
warm years, and could provide valuable signs of the potential trajectory 
of ecosystem change for a warmer future with increasingly prevalent 
extreme events”*. 
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Our observations of ecological change over an extreme heating event, 
with ecosystem consequences that are at least in part independent 
of coral mortality, may help to explain a lack of consistency among 
responses to bleaching observed in prior studies. For example, 
variability has previously been noted in herbivore responses”, despite 
relatively consistent increases in algal resources following coral death”. 
This response is of critical importance, as the biomass of herbivorous 
fishes can be highly influential in determining the recovery trajectories 
of bleached reefs”. Scraping herbivores are considered to be particularly 
important for supporting reef recovery”, and this group declined on 
northern reefs in our study. Whether losses of scraping herbivorous 
fishes in the northern GBR and Coral Sea will affect recovery of some 
of the most impacted reefs in the region remains an important question. 

Ecosystem impacts of coral loss are likely to increase during the next 
decade in the GBR and Coral Sea if widespread erosion of dead cor- 
als occurs’*!5, The extent to which the 2016 mass bleaching event 
proves ecologically catastrophic remains uncertain, as does the sum 
of accumulated effects from multiple bleaching events (as highlighted 
elsewhere***?7). However, rapid local recovery may occur on some 
reefs”®. Either way, the trajectories of bleached reefs will be greatly 
influenced by the new community structures that we observed during 
a critical stage of reef recovery, and are thus inextricably linked with 
warming-related reshuffling of reef communities. 

Overall, our results highlight the need for managers and researchers 
to consider broad spatial and temporal responses to the marine heat- 
ing events amongst fishes and other biota, beyond the more readily 
observable impacts on coral habitat”®. For example, potential ecological 
consequences of the changes that were observed in the northern GBR 
and Coral Sea could be exacerbated if herbivorous fishes were targeted 
by fisheries in these regions, whereas equivalent herbivore exploitation 
may not be an urgent management concern in locations where gains in 
herbivores occur (such as the southern GBR in our study). Likewise, 
functional changes in fish and invertebrate communities driven by 
extreme events may either complement or work against efforts to save 
reefs through restoration and assisted evolution of corals. Geographical 
location has been recognized as an important input into conservation 
planning and management from the perspective of considering patterns 
in ocean thermal regimes*°. Our study highlights how location can 
additionally be important from the perspective of thermal affinities of 
community members. Accounting for the realized thermal niches of 
species in key functional groups may allow managers to more explicitly 
consider the trade-off between managing areas in which more species 
and functional groups are vulnerable to warming events, versus those in 
which fewer negative effects are expected. The former could potentially 
prolong local persistence of species and ecological stability by removing 
extractive pressures, and the latter may provide important reference 
areas for determining the importance of novel ecological interactions 
in shaping future reef ecosystems. 
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METHODS 


Data reporting. No statistical methods were used to predetermine sample size. 
The experiments were not randomized and the investigators were not blinded to 
allocation during experiments and outcome assessment. 

Survey methods. Standardized data were obtained on fishes, mobile invertebrates, 
coral and algae along 768 50-m underwater transects by trained scientific and rec- 
reational divers who participated in the citizen science Reef Life Survey (RLS) pro- 
gram. Full details of census methods are provided elsewhere!?*!*, and an online 
methods manual (https://reeflifesurvey.com) describes all data-collection methods. 
Data quality and training of divers have previously been described'?*°. All observed 
fish species were counted in duplicate 5-m-wide transect blocks and aggregated as 
densities per 500-m? transect, and cryptic fishes and mobile invertebrates >2.5 cm 
total length in duplicate 1-m-wide transect blocks (aggregated to 100-m/ transect 
area) on the same transect lines. Fish length and abundance estimates were con- 
verted to biomass using species-specific length-weight coefficients obtained from 
FishBase (https://fishbase.org), as used in previous studies with the RLS data***>. 
Invertebrate classes used for this study were Asteroidea, Cephalopoda, Crinoidea, 
Echinoidea, Gastropoda, Holothuroidea and Malacostraca. All individuals from 
these classes exceeding 2.5 cm total length were included in richness estimates for 
invertebrates, and in functional richness analyses. 

Photoquadrats were taken vertically downward of the substrate every 2.5 m 

along each of the same transect lines, and later scored using a grid overlay of 
5 points per image, 100 points per transect. Categories of benthic cover scored were 
from a set of 50 morphological and functional groups of algae and corals (Extended 
Data Table 1), as previously described® and aligning with the standard Australian 
hierarchical benthic classification scheme**. Analyses undertaken for this study 
were based on the sum of all live hard coral categories (that is, percentage of live 
hard coral per transect), and the sum of all algal categories (percentage of algal 
cover per transect), with categories listed in Extended Data Table 1. 
Survey design. Matching before—after bleaching surveys were undertaken at 186 
GPS-referenced sites at 53 reefs (see Fig. 1 for distribution of reefs; mean = 3.5 sites 
per reef) along the full length of the GBR and western Coral Sea region within the 
Australian Exclusive Economic Zone. At each site, multiple surveys (mean = 2.1 
transects per site) were undertaken at different depths, with transects laid along a 
depth contour. Depths were binned (see ‘Covariates’), such that the site-by-depth 
bin was the level of replication, making 233 matching site-by-depth replicates sur- 
veyed both before and after the bleaching event. 

Different divers often surveyed the fishes and the invertebrates along the 
same transect line. Pre-bleaching surveys were mostly undertaken from a survey 
cruise along the entire GBR and Coral Sea in 2015 (42% of pre-bleaching surveys) 
and a previous survey cruise through the GBR and Coral Sea in 2013 (39% of 
pre-bleaching surveys). Additional ‘before data (19%) were collected at Lizard 
Island, Great Keppel Island and the Whitsundays in 2010, and some sites in the 
central Coral Sea and GBR in 2012. All post-bleaching data were collected during 
a survey cruise through the entire region from November 2016 to March 2017. 
No strong biases were apparent in the interval between pre- and post-bleaching 
surveys along the latitudinal gradient or locations experiencing different heating 
anomalies (Extended Data Fig. 8). 

Seven of the 10 divers who undertook pre-bleaching surveys also undertook 
post-bleaching surveys, and G,J.E. and R.D.S.-S. together undertook 45% of all fish 
surveys (and led 85% of survey voyages before and 92% after the bleaching event). 
There was thus a substantial element of consistency in divers during the study. To 
explore the effect of different divers undertaking surveys at different times, how- 
ever, we reran the models for Fig. 2 and Extended Data Fig. 4 with ‘diver’ included 
as a random effect. This resulted in no changes in the effect sizes or conclusions. 
Therefore, results are presented for models without the diver effect, so that mar- 
ginal uncertainty intervals include site-to-site variation but not observer variation. 
Species traits. All fishes and invertebrates were allocated into one of the following 
trophic groups: corallivores, scraping herbivores, benthic invertivores, algal farm- 
ers, browsing herbivores, omnivores, planktivores, higher carnivores, excavators, 
detritivores, suspension feeders and cleaners. Additional traits used for calculation 
of functional richness were: maximum body size (included as 10-cm bins up to 
50 cm, and all species which grow to >50 cm binned together), and water column 
position (benthic, demersal, pelagic site-attached and pelagic non-site-attached). 
All traits were taken from a previously published dataset*”. Functional richness 
was calculated as the richness of functional entities per 50-m transect, in which 
all species with the same combination of trait levels for those three traits were 
considered functionally equivalent. 

STI values were taken for each species from a previously published dataset”°, 
and represent the midpoint between the 5th and 95th percentile of local mean 
sea-surface temperature values from all occurrence locations of the species. It 
thus represents the centre of each species’ range when expressed as a range of sea 
temperatures experienced across its distribution, and provides a nuanced means 
of ordering species by their preferences for warmer or cooler environments. Full 


LETTER 


details, including discussion of strengths and weaknesses, are provided in previous 
publications””°. 

Covariates. The mean depth contour of each reef transect was recorded by divers 
during surveys, with surveys then allocated into three depth bins (<4 m, 4-10 m 
and >10 m). For any before—after comparisons, we first obtained the mean values 
of univariate responses taken from among all transects within each depth bin at 
a given site (that is, site-by-depth bin combinations). This gave 233 site-by-depth 
combinations, with a mean of 76 sites and 35.3 reefs per depth class. For each site, 
we also applied a four-level categorical measure for wave exposure: (1) sheltered, 
with only wind waves from non-prevailing direction; (2) wind-generated waves 
from the prevailing direction; (3) exposed to ocean swells, either indirectly with 
exposure to prevailing winds, or directly but sheltered from prevailing winds; or 
(4) exposed to open ocean swell from prevailing direction. There was a mean of 
62 sites and 24 reefs per exposure category. Reef habitat categories are often used 
for ecological studies of coral reefs (for example, slope, crest, flat and lagoon), but 
delineation between similar or adjacent habitats can sometimes be difficult. Instead 
of making these delineations for our survey sites, we considered that these two 
environmental axes of wave exposure and depth together appropriately capture 
the important variation between such reef habitat classifications with respect to 
their importance in describing potential for bleaching”. 

Sea-surface temperature anomalies used in analyses relating coral-cover change 

to degree heating days (DHD) was obtained from the Reef Temp Next Generation**. 
Fine scale anomalies for the period of January-March 2016 were matched to survey 
sites. 
Analysis of coral- and algal-cover change. We modelled the response of change 
in coral and algal cover as a function of DHD using a Bayesian mixed-effects 
model (n = 211 site-depth combinations where benthic-cover data were available). 
Additional fixed covariates included the depth of survey, the four wave exposure 
categories, a factor for whether the survey was in the GBR or the Coral Sea, the 
initial cover of corals or algae, an interaction between DHD and depth and an 
interaction between DHD and initial cover. We included a random effect for reef. 
We did not include a random effect for sites nested within reefs because only 36 had 
measurements at more than one depth across both time periods (before and after 
bleaching). Change in coral and algal cover was modelled with Gaussian errors and 
standard model checks confirmed that this assumption was appropriate. We scaled 
the variance of the model by the number of years between before and after surveys 
(maximum = 7 years, mean = 3.3 years), because we expect greater variance in the 
measured change in coral cover when those measurements were taken a longer 
time apart. We compared models with and without the variance scaling using the 
widely applicable information criteria (WAIC)**"°. The WAIC indicated that for 
the coral-cover model with variance scaling provided an enhanced fit to the data 
(1,658 versus 1,721), whereas for the algae-cover model the unscaled model had 
an enhanced fit (1,801 versus 1,806), so we present results from these best models. 
However, the estimated effects of the covariates were nearly identical regardless 
of model used in both cases. 

We present the median estimated effects of DHD on coral and algal cover in 
Extended Data Fig. 1, and credible intervals are 95% quantiles. We also predict 
median change and the 95% marginal credible intervals for change in coral and 
algal cover across the range of DHD for each depth category values for low wave 
exposure reefs in the GBR (Fig. 1d). Credible intervals for predictions were inte- 
grated across all random effects, so they should be interpreted as effect sizes relative 
to variation across reefs. For the coral model, there was a strong interaction effect 
of initial coral cover with DHD, so we separately plotted predicted effects for the 
mean initial coral cover and +1 s.d. in initial coral cover. 

We fitted the Bayesian mixed-effects models using the INLA framework*! 
implemented in the R programming language” using the INLA R package (version 
17.06.20; http://r-inla.org, accessed 4 October 2017). The prior for the precision on 
the random effect used the log-gamma prior with shape = 1 and rate=1 x 10~°, 
although use of other standard priors did not change the results. Priors for fixed 
effects had mean = 0 and precision = 0.001. 

Mapped coral change values in Fig. 1a represent absolute change in live hard 
coral cover at each site, with the change values interpolated using an inverse-dis- 
tance weighting and a buffer of 50 km applied from around each reef surveyed 
(implemented with the gstat package in the R program*’). Symbols on the map 
thus represent the locations of the reefs, although coral change values come from 
the aggregation of smaller scale data at individual sites within reefs. 
Comparison of bleaching-impacted and unaffected reefs. To isolate ecological 
impacts most likely arising from bleaching-associated coral loss, we used the fol- 
lowing criteria to define ‘bleaching-impacted’ reefs: (1) pre-bleaching live hard 
coral cover >20% on average (across all transects at the reef). This meant that 
the starting community was more likely one to be comprised of coral-associated 
fish and invertebrate species; (2) loss of live coral cover >40% of pre-bleaching 
values, on average; and (3) experienced more than 40 DHD. These criteria were 
collectively used as a means to show the maximum likely impact of the loss of coral 
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from bleaching, by ensuring there was adequate coral cover to start with (criterion 
1), and that coral losses were at least typical of the mortalities observed in other 
studies! (criterion 2), while providing some confidence that observed coral loss 
was most likely attributable to bleaching (criterion 3). We cannot be certain about 
the latter (see comments below about other potential impacts on coral during the 
study period), but 40 DHDs well exceeds the threshold for bleaching as identified 
previously’ for this same bleaching event. Reefs defined as ‘bleaching-impacted’ 
were widely dispersed along the GBR and Coral Sea (Extended Data Fig. 3). 

To provide an objective contrast with these reefs, we also selected reefs that were 
clearly unaffected by bleaching. We used the same criteria as above, but instead of 
losing at least 40% of live coral cover, we selected only those that experienced >1% 
mean gain in live hard corals on average (using the mean percentage of coral-cover 
change, rather than mean pre-bleaching minus mean post-bleaching cover). For all 
bleached and unaffected reefs, we examined responses in key metrics of the coral, 
fish and invertebrate communities, as shown in Extended Data Fig. 3. 

Regional community-structure change. Non-metric multidimensional scaling 
was undertaken separately on reef fish and invertebrate community data to show 
broad regional change in community structure and visualize consistencies in the 
direction of community change among regions between before and after surveys 
(Extended Data Fig. 5). Mean biomass per fish species (kg per 500 m7’) and mean 
abundance per mobile invertebrate species (individuals per 100 m7) were calcu- 
lated separately across all surveys within each 2° latitudinal band for the GBR and 
Coral Sea. Biomass and abundance data were log-transformed and Bray—Curtis 
dissimilarity matrices used for ordination of each. The analysis was undertaken 
in PRIMER“, with symbols subsequently colour-coded for data collected before 
and after the bleaching event, and labels to indicate GBR and Coral Sea regions. 
Analysis of regional-scale ecological changes. We analysed the response of 
nine fish and invertebrate metrics to bleaching using Bayesian generalized linear 
mixed-effects models (GLMMs). Each metric value on each survey was modelled 
with covariates for latitude, depth, coral cover, protection status (no-take ‘green’ 
zone versus all other zone types), GBR or coral sea, wave exposure, time (before 
or after the bleaching event) and an interaction between time and latitude. The 
interaction was included to allow for the possibility that latitudinal gradients in 
each metric changed from before to after the bleaching event. We included random 
effects for reefs and sites within reefs. 

We chose error distributions appropriate for each metric. These were: Poisson 
with a log-link for the richness metrics, log-normal with an identity link for the 
biomass metrics, and binomial with a logit link for urchin presence. We added 
0.5 to the logged biomass data so that zero values were not excluded. Checks of 
residuals confirmed that a log-normal distribution was appropriate for the biomass 
data. Rootograms* and Dunn-Smyth residuals** were used to confirm the count 
models were fitted appropriately. 

We used the INLA framework to fit the Bayesian GLMMs, using the same set- 
tings as for the coral change model. We give effect sizes as median effects of each 
covariate in Fig. 2 and Extended Data Fig. 4 with 95% credible intervals. Credible 
intervals that did not overlap zero in Extended Data Fig. 4 are identified by aster- 
isks. We also predict metrics across the latitudinal gradient before and after the 
bleaching event with marginal 95% credible intervals. Predictions across latitude 
were made for a reef of <4-m depth, with the mean level of hard coral cover, inside 
a protected area with low wave exposure and for the GBR. Thus positive and neg- 
ative effects in Extended Data Fig. 4 can be interpreted in relation to these levels 
of the relevant covariates. Choosing other covariate values for the baseline would 
affect the magnitude of the patterns but not the overall trend. 

Possible recruitment events. We tested whether patterns in richness before and 
after bleaching could be related to a coincident fish recruitment event. We analysed 
the mean richness of juvenile fishes per reef (29 reefs in total from extreme north 
and south) as a function of three binary covariates: before versus after bleaching, 
Coral Sea versus GBR and north versus south, using a linear model, implemented 
in the INLA framework"! from the R programming language™. Juveniles were 
defined as any individuals that were 10 cm or less, for species that exceed 12.5 cm 
in maximum size. No significant change in the richness of juveniles was evident 
before and after bleaching (mean difference = 1.70 with lower and upper 95% cred- 
ible intervals of —0.6 and 4.0), with the distribution of data shown in Extended 
Data Fig. 7. 

Other potential effects on results. Few trends in fishes and invertebrates were 
related to changes in coral cover when considered at the scale of the whole 
study region, and primary study conclusions do not rest on the assumption that 


all observed coral mortality was driven by the 2016 bleaching event. Cyclones, 
crown-of-thorns starfish, and pollution and sediment from riverine outputs are 
other potential impacts on corals across the region. We checked the database of 
past tropical cyclone tracks on the Bureau of Meteorology website (http://www. 
bom.gov.au/cyclone/history/index.shtml, accessed 7 April 2018) for intersection of 
cyclone tracks with our survey sites. Surveys were completed before cyclone Debbie 
(2017), and the only surveys done before cyclone Yasi (2011) (Lizard Island, Port 
Douglas, Whitsundays, Keppel) were in areas outside of the destructive path of this 
cyclone. However, cyclone Ita was reported to have impacts on corals in the Lizard 
Island area during the study period”, and there is a possibility that other smaller 
cyclones caused highly localized impacts. Thus, caution is required in ruling out 
cyclone damage as contributing to coral-cover changes observed in some locations. 
We cannot be certain that crown-of-thorns starfish did not affect coral cover at 
our sites in between surveys, but these are also recorded on the surveys of mobile 
invertebrates and were found in extremely low densities (mean = 1.4 individuals 
per 50 m’ when found, only at 15 sites). It is not impossible that a wave of crown- 
of-thorns starfish came through and reduced live coral cover at a small number 
of sites, but such effects at this very small number of sites would unlikely have a 
detectable impact on results or conclusions of the study. Likewise, pollution and 
sediment from riverine sources could not have been responsible for any changes in 
the Coral Sea (>250 km offshore), and would be unlikely to have impacted any sites 
other than a small number of inshore locations. No substantial pollution events 
(for example, oil spills) were noted near survey locations in the period. Regardless, 
care is required in inferring causality for observed coral-cover change in this study, 
and no assumption should be made that all coral loss was attributable to bleaching. 
Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 
Data availability. Raw reef fish and invertebrate abundance data and photoquad- 
rats of coral cover are available online through the Reef Life Survey website: https:// 
reeflifesurvey.com. 
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Extended Data Fig. 1 | Results of GLMMs for changes in coral and and starting cover, depth category (depths between 4 and 10 m and 
algal cover during the 2016 bleaching event. a, Changes in coral cover. >10 m modelled in comparison to <4-m depth), and the interaction 
b, Changes in algal cover. Change in cover is modelled as a function between depth category and DHD (n= 211 site-depth combinations). 
of the influences of starting cover (of corals and algae, respectively), All continuous predictors were normalized to mean = 0 and s.d.= 1 for 
wave exposure, thermal anomaly (DHD), the interaction between DHD comparative purposes. 
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Extended Data Fig. 2 | Changes in algal and coral cover spanning the sites with the lowest coral cover after the bleaching event (p = —0.28). 


2016 bleaching event. a, Coral- and algal-cover change were negatively n= 211 site-depth combinations. 
correlated (p= —0.56). b, The greatest algal-cover increases occurred at 
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Extended Data Fig. 3 | Ecological changes on surveyed reefs most 
clearly affected by coral bleaching (red) versus un-impacted reefs 
(blue). Reefs categorized as bleached were those with >20% pre-bleaching 
live coral cover, that experienced >40 DHD and that lost >40% of pre- 
bleaching coral cover (see Methods for rationale). The un-impacted reefs 
were those that had >20% pre-bleaching live coral cover and experienced 
>40 DHD, but did not show a reduction in coral cover. The vertical axis 

is the percentage change of each metric across the reefs in each category 
(n=6 bleached, n=5 unbleached reefs), and horizontal lines on box 

plots show median, first and third quartiles, with the range indicated by 
the error bars. Crosses indicate means and circles indicate individual 

reefs within quartiles. Values for corallivores, browsing herbivores and 
scraping herbivores describe change in densities of species in these groups. 
Densities and species richness are means per 500 m? (fishes) or 100 m? 
(invertebrates). Bleached and unbleached reefs each include reefs from 
both northern and southern regions. Only coral cover differed noticably 
between these two groups of reefs (mean difference = —72%, with 95% 
credible intervals of 25-107%), although there was a small decline in 
corallivore densities post-bleaching (mean difference = 42% with 95% 
credible intervals from —0.24 to 78.0). 
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Extended Data Fig. 4 | Effect sizes from GLMMs of regional change modelled in relation to differences between the GBR and Coral Sea reefs 


for each ecological metric. Median additive effects of each covariate 

on the linear expectation for each metric (with 95% credible intervals as 
error bars) (n = 233 site-by-depth-category combinations). Effect sizes 

are on a log scale for all metrics, except for sea urchin presence, which 
gives the effect on the log-odds of presence versus absence. The influences 
of latitude, and its change from before to after the bleaching event (the 
interaction between latitude and bleaching (Latitude * bleaching)), are 


(GBR), wave exposure (Exposure), depth of the survey (depths between 

4 and 10 mand >10 m modelled in comparison to <4-m depth), the 
percentage cover of live hard corals in the survey (Coral cover) and before 
versus after the bleaching event (After bleaching). Effects for which 
credible intervals do not overlap zero are indicated with black, rather than 
grey, points and error bars. 
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Extended Data Fig. 5 | Non-metric multidimensional scaling plots 

for reef fish and mobile invertebrate communities along the GBR and 
Coral Sea. Fish biomass data (top) and invertebrate abundance data 
(bottom) were averaged across surveys within 2° latitudinal bands, with 
number labels representing the northern latitude (that is, 21 represents 
the 2° band from 21° to 23° south). Coral Sea reefs are distinguished from 
those in the GBR by a ‘C’ in the label. Symbols have been colour-coded 
for data collected before and after the bleaching event (n = 13 latitudinal 
bands each before and after). 
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Extended Data Fig. 6 | Changes in the trophic structure of reef fishes 
following the 2016 mass bleaching event on the GBR and Coral Sea. 
Bars represent the proportion of total biomass made up by each trophic 
group, averaged across surveys on each reef, and reefs ordered by 
latitude. Cleaners and algal farmers were removed owing to their small 
contributions to biomass. 
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Extended Data Fig. 7 | Local species richness of juvenile fishes (per 

500 m’) before and after the 2016 mass bleaching event on the GBR 

and Coral Sea. Species richness is shown before (blue) and after (red) the 
2016 mass bleaching event on the GBR (left) and Coral Sea (right). ‘North’ 
reefs were north of 12° S (n= 10 reefs), and ‘south reefs were south of 19° S 
(n= 19 reefs). Juveniles were classified as any individuals 10 cm or less, 

for species that exceed 12.5 cm in maximum size. A Bayesian linear model 
indicated juvenile richness differed between the GBR and Coral Sea, but 
not between north and south or before and after the bleaching event (mean 
difference = 1.70 with lower and upper 95% credible intervals of —0.6 

to 4.0). The distribution of raw data is shown in box plots, with crosses 
indicating means and circles indicate individual reefs within quartiles. 
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Extended Data Fig. 8 | The distribution of sampling effort through the latitudinal gradient, and locations experiencing different heating 
space and time. The temporal gap between pre- and post-bleaching anomalies. For the box plot, the box shows the interquartile range and 
surveys (n = 768 surveys total) between GBR and Coral Sea, along whiskers are 1.5 interquartile range. 
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Extended Data Table 1 | Categories of coral and algal cover scored from photoquadrats 


Category 


Algae 


Coral 


Coral Sea 

Functional group Before 

Caulerpa spp. 3.6 
Crustose coralline algae 21.1 
Diatom/cyanobacterial slime 5.6 
Encrusting leathery algae 4.5 
Filamentous algae 22 
Foliose red algae 1.0 
Geniculate coralline algae 4.0 
Green calcified algae 16.7 
Other foliose green algae 4.9 
Small to medium foliose brown algae 15 
Turfing algae (<2 cm) 15.3 
Ahermatypic corals 1.8 
Bleached coral 3.8 
Bottlebrush corals 5.7 
Branching Acropora 9.9 
Columnar corals 2.0 
Corymbose corals 2.9 
Digitate corals 3.8 
Encrusting corals 9.7 
Foliose corals 4.9 
Hydrocoral 2.3 
Large-polyp stony corals 1.5 
Massive corals 5.0 
Organ-pipe coral (Tubipora) 1.0 
Other branching/erect corals 5.1 
Pocillopora 3.3 
Soft corals and gorgonians 8.3 
Submassive corals 35 
Tabular Coral 4.7 


After 
46 
26.1 
46 
3.5 
1.6 
2.0 
3.9 
20.0 
8.5 
15.4 
17.8 
1.5 
3.1 
16.2 
6.4 
7.3 
2.8 
1.4 
7.9 
6.9 
5.4 
1:2 
4.5 
1.3 
8.7 
3.2 
8.1 
2.8 
3.4 


Great Barrier Reef 


Before 


18.8 


12.0 


7.3 
2.6 
7.8 
2.9 
9.6 


After 
11.4 
15.3 


10.9 
3.1 
7.1 
4.8 

12.8 
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Mean percentage cover values from before and after the bleaching event are shown for sites the Coral Sea and GBR. Categories are for live cover, with dead or bleached individuals or colonies scored 
as such (bleached corals were scored if white at the time of surveys, and only summed bleached corals were included here). Soft corals were excluded from summed cover for analysis of live hard 


coral. 
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Differential tuning of excitation and inhibition 
shapes direction selectivity in ferret visual cortex 


Daniel E. Wilson!?*:+5, Benjamin Scholl!> & David Fitzpatrick! 


To encode specific sensory inputs, cortical neurons must generate 
selective responses for distinct stimulus features. In principle, 
a variety of factors can contribute to the response selectivity of 
a cortical neuron: the tuning and strength of excitatory!’ and 
inhibitory synaptic inputs*, dendritic nonlinearities’ and spike 
threshold!™"!. Here we use a combination of techniques including in 
vivo whole-cell recording, synaptic- and cellular-resolution in vivo 
two-photon calcium imaging, and GABA (7-aminobutyric acid) 
neuron-selective optogenetic manipulation to dissect the factors that 
contribute to the direction-selective responses of layer 2/3 neurons in 
ferret visual cortex (V1). Two-photon calcium imaging of dendritic 
spines!~!3 revealed that each neuron receives a mixture of excitatory 
synaptic inputs selective for the somatic preferred or null direction 
of motion. The relative number of preferred- and null-tuned 
excitatory inputs predicted a neuron’s somatic direction preference, 
but failed to account for the degree of direction selectivity. By 
contrast, in vivo whole-cell patch-clamp recordings revealed a 
notable degree of direction selectivity in subthreshold responses 
that was significantly correlated with spiking direction selectivity. 
Subthreshold direction selectivity was predicted by the magnitude 
and variance of the response to the null direction of motion, and 
several lines of evidence, including conductance measurements, 
demonstrate that differential tuning of excitation and inhibition 
suppresses responses to the null direction of motion. Consistent 
with this idea, optogenetic inactivation of GABAergic neurons 
in layer 2/3 reduced direction selectivity by enhancing responses 
to the null direction. Furthermore, by optogenetically mapping 
connections of inhibitory neurons in layer 2/3 in vivo, we find 
that layer 2/3 inhibitory neurons make long-range, intercolumnar 
projections to excitatory neurons that prefer the opposite direction 
of motion. We conclude that intracortical inhibition exerts a major 
influence on the degree of direction selectivity in layer 2/3 of ferret 
V1 by suppressing responses to the null direction of motion. 

Using sparse expression of the ultrasensitive fluorescent indicator 
protein GCaMP6s and two-photon calcium imaging’, we first deter- 
mined a neuron’s somatic preferred direction using drifting grating 
stimuli (example cell, Fig. 1a) and then presented the somatic preferred 
and null directions while imaging spines on apical and basal dendrites. 
Individual dendritic branches contained spines that responded 
preferentially to either the somatic preferred or null directions (example 
branch, Fig. 1b); this diversity existed throughout the dendritic fields 
of single neurons (Fig. 1c). By quantifying spine responses using a 
direction selectivity index (DSI, see Methods) we found that spines 
tuned for the somatic preferred and null directions had different DSI 
values (Fig. 1d; Wilcoxon rank-sum, P = 0.002, n = 384 preferred 
spines, n = 233 null spines from n = 17 cells from 10 animals), and in 8 
of 12 direction-selective cells (soma DSI > 0.3), more spines were tuned 
to the somatic preferred direction than to the null direction (Fig. le). 
Similar fractions of spines from basal (300 of 498) and apical (84 of 
119) dendrites responded more strongly to the somatic preferred direc- 
tion (60.2% and 70.6%, respectively). We then computed bootstrapped 


sums of normalized spine responses (see Methods) to assess the relation 
between the DSI of a neuron’s inputs and of its soma. Summed excita- 
tory synaptic input was weakly tuned (summed spine DSI=0.1 +0.1, 
median + interquartile range (IQR), n = 17 cells) and we found no cor- 
relation between summed spine and somatic DSI across our sample 
(Fig. 1f; r= —0.08, P=0.75, n=17 cells), regardless of how we assessed 
synaptic input direction selectivity (see Methods, Extended Data 
Fig. 1a, b). Together, these data emphasize that the functional specific- 
ity of excitatory synaptic inputs converging onto individual layer 2/3 
neurons is sufficient to account for somatic direction preference, but 
fails to explain the degree of somatic direction selectivity. 

Previous work has suggested that spike threshold amplifies weak 
biases in excitatory synaptic inputs to enhance the spiking direction 
selectivity of neurons in layer 4 of cat V1'*. To investigate whether 
subthreshold membrane potential (Vm) responses reflect the weak 
biases in excitatory inputs demonstrated by our spine imaging, we 
made whole-cell patch-clamp recordings from layer 2/3 neurons using 
a Kt-based internal solution (n = 76 cells from 23 animals; example, 
Fig. 2a, b). Most cells (78%, n =54 of 69) showed strong spiking direc- 
tion selectivity (Extended Data Fig. 2). Unexpectedly, nearly half of 
the cells with direction-selective spiking responses (48%, n = 26 of 54) 
showed strong direction tuning in their Vi, (Fig. 2c), and the degree 
of Vm selectivity was correlated with spiking selectivity (n =69 cells, 
r=0.56, P=5.67 x 107”). The strong direction selectivity evident in 
Vim responses is in sharp contrast to the weak selectivity predicted by 
the distribution of excitatory synaptic inputs and forced us to con- 
sider factors that might contribute to the strong direction tuning of Vin 
responses. In principle, the emergence of strong subthreshold tuning 
from broadly tuned excitatory inputs could reflect mechanisms that 
enhance the effectiveness of excitatory inputs tuned to the preferred 
direction, diminish the effectiveness of excitatory inputs tuned to the 
null direction or a combination of these factors. To distinguish among 
these alternatives, we investigated whether there was a consistent rela- 
tionship between subthreshold selectivity and subthreshold response 
amplitude to preferred and null direction stimuli. We found no cor- 
relation between V,, DSI and subthreshold response amplitude to the 
preferred direction (Fig. 2d; r= —0.001, P=0.99, n=76). By contrast, 
we found a strong anticorrelation between Vm DSI and null direction 
response amplitude (Fig. 2e; r= —0.69, P=5.07 x 107!7, n=76). These 
results indicate that factors that influence null direction responses are 
important in determining V,, selectivity. 

We then considered the degree to which inhibitory inputs contrib- 
ute to subthreshold responses to the null direction of motion®!>!®. 
Theoretical models predict that when levels of inhibition are high 
relative to excitation, not only is there a reduction in the level of depo- 
larization, but also a reduction in the ‘noise’ or Vin variability!718, We 
therefore examined the relationship between subthreshold DSI and Vn 
noise (see Methods) for the preferred direction and observed no sig- 
nificant correlation (Fig. 2f, r= —0.10, P=0.39, n=76 cells). Instead, 
we uncovered a strong anticorrelation between subthreshold DSI and 
Vm noise at the null direction (Extended Data Fig. 3; Fig. 2g, r= —0.61, 
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Fig. 1 | Direction tuning of excitatory synaptic inputs onto layer 2/3 
neurons in ferret V1. a, Example soma tuning (data, mean + s.e.m.; scale 
bar, 101m). b, Left, example dendritic spines (n = 11) pseudocoloured 
for direction preference (scale bar, 10 |1m). Right, trial-averaged spine 
responses to somatic preferred or null directions. c, Trial-averaged 


P=6.94 x 10~°, n=76 cells), consistent with a significant role for 
inhibition to the null direction in shaping the subthreshold direction 
selectivity of layer 2/3 neurons. 

The idea that inhibition contributes to direction selectivity contrasts 
with a number of demonstrations that inhibition normalizes cortical 
activity'?”°, These studies showed that excitation and inhibition are 
generally co-tuned*!**°?!, whereas our observations suggest that the 
tuning of excitatory and inhibitory inputs onto direction-selective neu- 
rons is dissimilar. To directly measure excitatory (G,) and inhibitory 
(G,) synaptic conductances underlying direction selectivity, we per- 
formed whole-cell patch-clamp recordings using a Cs*-based internal 
solution'* and recorded V,, responses to drifting gratings at different 
current steps to extract synaptic conductances and their direction tuning 
(Fig. 3a, b; see Methods). We observed a wide range of direction selec- 
tivity in synaptic conductances and found that excitatory and inhibitory 
DSls were not correlated (Fig. 3c, r= 0.043, P=0.91, n= 10 cells from 7 
animals) and therefore not co-tuned. In half of our recorded neurons, 
excitation and inhibition preferred opposite directions (A@ > 135°) and 
across the population there was a significant bias towards preferring 
opposite directions (Extended Data Fig. 4a, Monte Carlo significance 
test, P=0.023). Despite a lack of co-tuning, excitation and inhibition 
shared similar tuning bandwidth (Extended Data Fig. 4b). 

Understanding the effect of inhibitory conductances on subthreshold 
responses requires consideration of co-occurring excitatory conduct- 
ances. Thus, we measured the relative strength of inhibition as the ratio 
of inhibitory to excitatory conductance (G;/G,) and found that G,/G, 
was systematically larger for null direction than for preferred direction 
stimuli (Fig. 3e; Wilcoxon sign-rank, P= 0.037, n = 10). Moreover, the 
direction selectivity of predicted Vi, from empirically measured synap- 
tic conductances” (population tuning curves in Extended Data Fig. 4c) 
was significantly correlated with the G;/G, ratio at the null direction 
(Fig. 3f; r=0.81, P=0.008, n= 10 cells from 7 animals) but not the 
preferred direction (Extended Data Fig. 5). Consistent with our spine 
imaging data (Fig. 1f), predicted Vin direction selectivity was not cor- 
related with the direction tuning of excitation alone (Fig. 3d; r=0.49, 
P=0.15,n=10). Our measurements of synaptic conductances suggest 
that relatively stronger inhibitory input at the null direction enhances 
somatic direction selectivity. 
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Summed spine DSI 


responses for all significantly responsive spines (n = 62) from cell in a. 
d, DSI distributions for preferred (n = 384) and null (n= 233) direction- 
tuned spines. e, Fraction of spines on direction-selective cells (n = 12) 
preferring somatic preferred and null directions. f, Relationship between 
somatic DSI and summed spine DSI (n= 17 cells from 10 animals). 


If relatively greater G;/G, at the null direction contributes to direction 
selectivity in layer 2/3, inactivation of GABAergic neurons in layer 2/3 
would reduce suppression at the null-direction and reduce selectivity, 
as suggested by previous pharmacological studies”’. To test this hypo- 
thesis, we optogenetically suppressed layer 2/3 GABAergic neurons 
by expressing GtACR2 under the control of the mouse Dix5/6 (mDIx) 
enhancer and measured direction selectivity using whole-cell patch- 
clamp recordings with K*-based internal solution. Optogenetic inacti- 
vation of GABAergic neurons (Extended Data Fig. 6) increased evoked 
response amplitude (Fig. 3g) and reduced V,, DSI (Fig. 3h; P= 0.004, 
Wilcoxon sign-rank, n= 16 cells from 4 animals) and spiking DSI 
(Extended Data Fig. 7a). Notably, changes in subthreshold direction 
selectivity were not related to the absolute V,, depolarization induced 
by GABAergic photoinhibition in individual neurons (Extended Data 
Fig. 7b). Instead, the degree to which null-direction responses were 
modulated by GABAergic suppression (see Methods) depended on 
the cell’s V., DSI (Fig. 3j; r=0.56, P=0.025, n= 16 cells), whereas no 
such relationship was observed for modulation of preferred-direction 
responses (Fig. 3i; r=0.20, P=0.46, n= 16 cells). On the basis of these 
results, we conclude that inhibition enhances subthreshold direction 
selectivity through null-direction suppression, and we would predict 
that GABAergic neurons preferring the opposite direction would con- 
tribute to this suppression. 

GABAergic neurons in ferret V1 are direction-tuned and form direc- 
tion columns” aligned with the underlying intrinsic signal direction 
preference map (Extended Data Fig. 8a, b). For GABAergic neurons 
to innervate oppositely tuned excitatory cells, their projections must 
extend beyond the local direction domain and into adjacent cortical 
columns. This would be inconsistent with studies from mouse V1, 
where excitatory neurons receive inhibitory input from local (within 
100-200 jm) GABAergic neurons*>?’. However, in carnivore V1, it 
has been shown that GABAergic neurons make axonal projections that 
span longer distances”®”’. To test whether GABAergic neurons project 
beyond their local cortical columns, we labelled axon projections with 
punctate injections of AAV2/1-mDlx-GCaMP6s and characterized 
the direction tuning of axon projections at sites distal to the injection 
location (Fig. 4a-d). A substantial fraction (60.5%) of long-range pro- 
jecting individual boutons exhibited direction-selective responses 
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Fig. 2 | Subthreshold direction selectivity and evidence for null 
direction suppression. a, Example direction tuning of subthreshold V,, 
(grey) and spiking responses (black). Data are mean + s.e.m. b, Example 
single trial responses. c, Distribution of V,, DSI for 54 cells with direction- 
selective spiking (DSI > 0.3). d, No relationship between preferred 
direction response and V,, DSI (n= 76 from 23 animals). e, Relationship 
between null direction response and V, DSI; grey line is least-squares fit 
(n= 76). f, No relationship between preferred direction Vi, noise (defined 
as the difference in V,, s.d. between a given stimulus and the blank) and Vj, 
DSI (n= 76). g, Relationship between null direction Vin noise and V,, DSI; 
grey line is least-squares fit (n = 76). 


(Fig. 4e; population median 0.39 + 0.46 IQR, n= 815 boutons, 8 fields 
of view from 2 animals) and diverse direction preferences when com- 
pared to the intrinsic signal map (Fig. 4d). We found an unexpected 
abundance of direction-selective GABAergic boutons tuned to the 
direction opposite to direction domains (Fig. 4f). Furthermore, indi- 
vidual bouton preferences were significantly different from the map 


LETTER 


(Monte Carlo significance test, P< 0.001, n =493 boutons), providing 
an anatomical substrate for the synaptic inhibition observed in our 
previous measurements. 

To examine whether individual neurons receive inhibitory synap- 
tic input from distant GABAergic neurons, we developed a technique 
called somatically targeted optogenetic membrane potential mapping 
(STOMPM) to directly map the spatial connectivity of inhibitory 
neurons onto excitatory neurons in vivo. We localized channelrhodopsin-2 
to the soma and proximal dendrites of GABAergic neurons using a 
Ky2.1 targeting motif (Fig. 4g) to prevent stimulation of the neuropil 
and to enhance our functional resolution. As the direction preferences 
of GABAergic neurons are smoothly mapped in a columnar fashion” 
(Extended Data Fig. 8), we used patterned photostimulation driven by 
a digital light processing projector*’ to activate GABAergic neurons 
in local cortical regions (~100-200 1m, Fig. 4g) while recording from 
single neurons to measure inhibitory postsynaptic potentials (IPSPs) 
(example cell, Extended Data Fig. 9). Optical stimulation evoked robust 
IPSPs (Fig. 4h) even at spots distant from recorded cells. Neurons 
received inhibitory synaptic input from long distances (Fig. 4i); inhib- 
itory input fields often exceeded 1 mm along their major axes (Fig. 4i-k; 
major axis length 930 + 278 ym, median + IQR, n=21 cells from 7 ani- 
mals) and many inputs arrived from distances greater than 500 1m 
(Extended Data Fig. 10). We recognize that these measures are likely to 
underestimate the total extent of input field size (see Methods). Finally, 
we aligned our stimulation grid with the intrinsic signal direction 
preference map (Fig. 41) to characterize the functional origin of evoked 
IPSPs. Neurons with direction-selective V, (n =7 cells, mean tuning 
curve in Fig. 4m) received almost equivalent inhibitory synaptic input 
from null-tuned as from preferred-tuned direction domains (Fig. 4n). 

Previous studies suggest that inhibition and excitation are generally 
co-tuned*!*?%131 (Big. 40), as shown for orientation selectivity 
in mouse V1‘ and layer 4 simple cells of cat V11', albeit with distinct 
temporal dynamics‘, acting to scale or gate overall responses”. 
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Fig. 3 | Differential tuning between excitation and inhibition enhances 
direction selectivity. a, Estimated excitatory (blue, G.) and inhibitory 
(red, G;) synaptic conductances driven by gratings from an example cell; 
line is bootstrapped mean and error bars are bootstrapped s.d. b, Tuning 
of peak (see Methods) synaptic conductances and predicted Vn (dashed 
line) for cell in a; data are bootstrapped mean and s.d. c, Comparison of 
G, and G; DSI (n= 10 from 7 animals). d, Predicted V,, DSI (see Methods) 
compared to G, DSI (n= 10). e, Comparison of G;/G, at null and preferred 
directions (n= 10). f, Predicted V,, DSI compared to null direction 


G;/Ge; grey line is least-squares fit (n = 10). g, Example V» during visual 
stimulation and inactivation of GABAergic neurons expressing GtACR2 
(cyan) or without inactivation (black); dashed line is resting Vin. 

h, Comparison of V;, DSI with and without inactivation; black line 
indicates population means (n = 16 from 4 animals). i, Optogenetic 
modulation of preferred direction response versus V,,, DSI (n= 16). 

j, Optogenetic modulation of null direction response versus V, DSI; grey 
line is least-squares fit (n = 16). 
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Fig. 4 | Inhibitory interneurons make long-range, intercolumnar 
projections onto excitatory neurons. a, Epifluorescence image of injection 
site with GABAergic axon imaging sites highlighted; example field of 

view (FOV) in red. b, Intrinsic signal polar direction map for a. c, Example 
bouton FOV. d, GABAergic boutons overlaid on direction preference map; 
direction preference of boutons and intrinsic signal pseudocoloured as in 
b, bi-directional boutons coloured grey. e, Example bouton tuning curve 
(box in d); data are mean +s.e.m. f, Distribution of direction preference 
difference between GABAergic boutons (n = 493) and corresponding 
intrinsic signal direction preference map. g, Top, Flag staining of 

cells expressing AAV1-mDIx-ChR2-Flag-Kv2.1-p2a-H2b-CyRFP; 

bottom, experimental design: neurons in different cortical columns are 
optogenetically activated. h, Example of single spot illumination and Vy 
responses. i, Mean IPSP waveforms evoked by sampled spots. j, Map of 


By contrast, we find that cortical inhibition can suppress responses to 
specific stimuli through differential tuning with excitation (Fig. 40). 
Such differential tuning can arise through multiple combinations of 
excitation and inhibition, such that null-direction suppression is driven 
by either null-biased or equivalent inhibitory inputs for both direc- 
tions (Fig. 40). Differential tuning can enhance subthreshold selectivity, 
which is further augmented through spike threshold'! (Fig. 40). Our 
findings are conceptually similar to those in retinal ganglion cells*”, but 
differ in exact circuit implementation as ganglion cell direction selec- 
tivity arises through inhibitory input mediated by starburst amacrine 
cells. One factor we did not consider is the temporal interplay between 
excitation and inhibition, which could have an important contributing 
role in enhancing selectivity*!. Together with the results of previous 
studies, our findings indicate that the selective responses of cortical 
neurons are built with a broadly tuned palette of excitatory synaptic 
inputs that is further refined by enhancing responses to the preferred 
stimulus”*"? and suppressing responses to non-preferred stimuli. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0354-1 
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IPSP amplitudes. k, Distribution of IPSP-field major axis lengths across 
cells (1 =21). 1, Example stimulation grid aligned to intrinsic signal polar 
direction map. m, Peak-aligned average direction tuning curve for cells 
with direction-tuned membrane potential (DSI > 0.3, black, individual 
cells in grey, n=7). n, Fraction of spots tuned to a cell’s preferred (<45°) 
or null (>135°) direction (grey bars, mean + s.e.m.), 0, Cartoon model 

of co-tuning (top) and differential tuning (bottom) of excitation (G,) and 
inhibition (G;) for direction. Subthreshold direction selectivity is inherited 
from synaptic conductances when co-tuned. Differential tuning of G, and 
G;, whereby there is greater Gj/G, at the null direction, can preferentially 
suppress excitation and enhance subthreshold selectivity. With differential 
tuning, inhibition can be either bidirectional or oppositely tuned for 
direction relative to G,. 
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METHODS 


All procedures were performed according to NIH guidelines and were approved by 
the Institutional Animal Care and Use Committee at Max Planck Florida Institute 
for Neuroscience. 

Constructs. pAAV-mDIx-GCaMP6f-Fishell-2°° was a gift from G. Fishell 
(Addgene plasmid # 83899). pFUGW-hGtACR2-EYFP™ was a gift from J. Spudich 
(Addgene plasmid # 67877). pCAG-CyRFP1 was a gift from R. Yasuda (Addgene 
plasmid # 84356). pAAV.mDIx.ChR2-Flag-Kv2.1.p2a.H2b-CyRFP was a gift from 
M. Bolton. 

Virus injection. Female ferrets aged about 30 days (~P30; Marshall Farms) were 
anaesthetized with ketamine (50 mg/kg) and isoflurane (1-3%, delivered in O2), 
then intubated and artificially respirated. Atropine was administered to reduce 
secretions and a 1:1 mixture of lidocaine and bupivacaine administered subcu- 
taneously into the scalp. Animals were placed on a feedback-controlled heating 
pad to maintain internal temperature at 37°C. Under sterile surgical conditions, 
a small craniotomy (0.8 mm diameter) was made over the visual cortex 7-8 mm 
lateral and 2-3 mm anterior to the lambda. 

For spine imaging, we injected (52 nl per depth) a mixture (1:100,000) of 
AAV 1-hSyn-Cre and AAV1-Syn-FLEX-GCaMP6s (UPenn, ~1 x 10'? GC/ml) at 
400 and 200\1m below the pia through bevelled glass micropipettes (10-15 jim 
outer diameter). For imaging GABAergic axons or somata, we injected 5-30 nl 
of AAV1-mDlx-GCaMP6s at 400 and 200 1m below the pia. For optogenetic 
inactivation experiments, we injected 1 jul of AAV1-mDIlx-GtACR2-eYFP (titre 
>1 x 10! GC/ml, custom preparation from Vigene) at 400 and 200 1m below the 
pia through beveled glass micropipettes (15-20 1m outer diameter). For STOMPM 
(see below), we injected 1 jl of AAV1-mDlx-ChR2-Flag-Kv2.1-p2a-H2b-CyRFP 
(titre >1 x 10° GC/ml, custom preparation from Vigene) through bevelled glass 
micropipettes (15-20j1m outer diameter). To prevent dural regrowth and adhesion, 
the craniotomy was filled with sterile 1% w/v agarose (Type IIIa, Sigma-Aldrich). 
Cranial window. After 3-5 weeks of expression, ferrets were anaesthetized with 
50mg/kg ketamine and 1-3% isoflurane. Atropine and bupivacaine were adminis- 
tered as in virus injection procedure. Animals were placed on a feedback-controlled 
heating pad to maintain an internal temperature of 37-38°C. A tracheotomy was 
performed and an endotracheal tube installed to artificially respirate the animal. 
Isoflurane was delivered between 1 and 3% throughout the surgical procedure to 
maintain a surgical plane of anaesthesia. An intravenous cannula was placed to 
deliver fluids. ECG, end tidal CO2, external temperature and internal temperature 
were continuously monitored throughout the imaging session. 

The scalp was retracted and a custom titanium headplate adhered to the skull 
using C&B Metabond (Parkell). A 6-7-mm craniotomy was performed at the viral 
injection site and the dura retracted to reveal the cortex. For spine and axon imag- 
ing, one to two pieces of custom coverglass (3 mm diameter, 0.7 mm thickness, 
Warner Instruments) were adhered to a larger coverglass (8 mm diameter, #1.5 
thickness, Electron Microscopy Sciences) using optical adhesive (# 71, Norland 
Products) and placed onto the brain to gently compress the underlying cortex and 
dampen biological motion during imaging. For population imaging, a single cover- 
glass (5mm diameter, #1.5 thickness, Electron Microscopy Sciences) was adhered 
to the bottom of a titanium insert and then placed onto the brain. In both cases, a 
stainless steel retaining ring (5/16-inch internal retaining ring, McMaster-Carr) 
maintained downward pressure on the cranial window throughout the experiment. 

For whole-cell recording and optogenetic experiments, the cranial window was 
filled with agarose (1.6%w/v, Type IIIa, Sigma) and a coverglass placed on top 
of the agarose. For pipette access, we drilled holes offset from the centre of the 
coverglass to allow our pipette to approach the centre of the cranial window at 
an oblique angle. The cranial window was hermetically sealed using a stainless 
steel retaining ring (5/16-inch internal retaining ring, McMaster-Carr), Kwik-Cast 
(World Precision Instruments), and Vetbond (3M). A 1:1 mixture of 1% tropi- 
camide ophthalmic solution (Akorn) and 10% phenylephrine hydrochloride oph- 
thalmic solution (Akorn) was applied to both eyes to dilate the pupils and retract 
the nictating membranes. Contact lenses were inserted to protect the eyes. Upon 
completion of the surgical procedure, isoflurane was gradually reduced (0.6 to 
1.5%) and then vecuronium (2 mg kg"! hr~!) or pancuronium (2 mg kg"! hr!) 
was delivered intravenously to immobilize the animal. 

Visual stimuli. Visual stimuli were generated using Psychopy*. The monitor was 
typically placed 25 cm from the animal. After mapping somatic direction tuning 
using a grating protocol, we presented the somatic preferred and null directions 
of motion while imaging dendrites and dendritic spines. For whole-cell recording, 
we optimized the preferred spatial frequency of the stimulus for the cell being 
recorded. Typical preferred spatial frequencies ranged from 0.04 to 0.25 cycles 
per degree. 

Two-photon imaging. Two-photon imaging was performed on a Bergamo II 
microscope (Thorlabs) running Scanimage* 2015 or 2016 (Vidrio Technologies) 
with 940-nm dispersion-compensated excitation provided by an Insight DS+ 
(Spectraphysics). For spine and axon imaging, power after the objective was limited 


to <60 mW, dependent on depth. Cells were selected for imaging on the basis 
of their position relative to large blood vessels, responsiveness to visual stimu- 
lation, and lack of prolonged calcium transients resulting from overexpression 
of GCaMP6s. Images were collected at 30 Hz using bidirectional scanning with 
512 x 512 pixel resolution. Images of somata ranged from 50 to 100\1m on a side, 
while images of dendrites were ~40 jm on a side. Images of axons were collected 
at 512 x 512 pixel resolution with fields of view ~100|1m on a side. 

Whole-cell patch-clamp recordings. Recordings were performed by inserting a 
pipette through an agarose-filled craniotomy or by using a coverglass with a hole 
drilled for pipette access. A silver-silver chloride reference electrode was inserted 
into the agarose or muscle. Recordings were made in current clamp mode and 
current pulses delivered by custom Labview software. 

For measurements of membrane potential tuning, spike tuning, effects of opto- 
genetic inhibition and connectivity mapping, pipettes of 5-8 MQ resistance were 
pulled using borosilicate glass (King Precision Glass) and filled with an intracel- 
lular solution containing (in mM) 135 K gluconate, 4 KCl, 10 HEPES, 10 Nap- 
phosphocreatine, 4 Mg-ATP, 0.3 Na3GTP, 0-0.1 Alexa 594 or 488, pH 7.2, 295 
mOsm. Neurons were recorded from layer 2/3 (100 to 800 j1m below the pia) using 
a Multiclamp 700B (Molecular Devices). Series resistance and pipette capacitance 
were corrected online. Series resistance for recordings typically ranged from 20 to 
80 MQ. Analogue signals were digitized using Spike2 (CED). For optogenetic inac- 
tivation experiments, a fibre (1 mm, NA .63) coupled to a 455nm LED light source 
(Prizmatix) was lowered to 3-5 mm above the cortical surface. Power density 
at the cortical surface ranged from 1 to 4mW/mm”. Optogenetic stimulation either 
coincided with visual stimulation, or began with a brief ramp (100 to 300 ms) 
before visual stimulation. 

For measurements of synaptic conductances, the internal solution contained 
(in mM) 135 Cs-MeSOu, 10 QX-314, 4 TEA-Cl, 2 EGTA, 2 MgATP, 10 HEPES, 10 
Nap-phosphocreatine (pH 7.3, 295 mOsm) and pipettes were typically 6-9 MQ. 
Capacitance compensation was corrected online and series resistance was corrected 
online or offline. Conductance measurements typically began around 30 min after 
break-in to allow the internal solution of the pipette to dialyse the cell, eliminating 
action potentials and depolarizing the resting membrane potential as expected with 
the use of Cst and QX-314. 

Connectivity mapping. Connectivity mapping (STOMPM) was performed on 
a custom-built microscope based on previously published designs*’. A digital 
light processing projector (X600, Optoma) with its colour wheel removed was 
mounted to a tilt platform (Siskiyou) and linear stage (Thorlabs). A 50mm f/1.4 
SLR lens (Nikkor) was mounted as close as possible to the projector and coupled 
to an achromatic doublet (AC508-150-A). Light passed through a blue dichroic 
filter (52-532, Edmund Optics) and was reflected onto the sample using a dichroic 
mirror (T495LPXR, Chroma), and focused onto the sample using a 35mm f/2.0 
SLR lens (Nikkor). Emission light passed through a 105 mm 2.0 lens (Nikkor) 
and an emission filter (FF01-600) and was imaged onto a camera (Xyla, Andor) 
controlled by Micromanager (http://www.micro-manager.org). Single pixels on the 
DMD corresponded to ~4 1m at the sample. Diffuse background light was <0.1 
mW/mm?. Opsin was restricted to the soma using the Kv2.1 targeting motif*”**. 
Before obtaining whole-cell recordings, we focused excitation light on the cortical 
surface. Upon break-in, we first measured the direction tuning of the cell using a 
grating protocol. Then, we centred a stimulation grid on the pipette and delivered 
25-50 trials of random grid stimulation. Spots were typically 100-200 1m full width 
at half-maximum (FWHM), 1-3 mW power, and displayed for 100 ms. We used 
positive current injection to depolarize the cell and increase the driving force for 
IPSPs (reversal potential ~75-70 mV, Extended Data Fig. 9). We probably underes- 
timate input field sizes owing to limitations in the spatial spread of virus injection, 
blue light absorption in blood vessels and experimental geometry in which the 
large patch pipette interferes with light stimulation. 

Intrinsic signal imaging. Intrinsic signal imaging was performed on the STOMPM 
microscope or on the Thorlabs Bergamo II. The cortex was illuminated with blue 
light to obtain a blood vessel map, after which collimated 630 nm light from an 
LED (Thorlabs) was directed onto the surface of the brain to measure intrinsic 
haemodynamic responses. Visually evoked responses were collected at ~50 Hz 
using an Andor Xyla camera. Visual stimuli were blockwise grating stimuli (8 s on, 
8 s off, 0.06-0.1 cycles per degree, 16 directions). 

Fixation and immunostaining. Upon completion of imaging, isoflurane was 
raised to 5% and 0.5 ml Euthasol given IV. The animal was transcardially perfused 
with 100 ml of 0.9% NaCl (w/v) and then 500 ml of 4% paraformaldehyde in 0.1 M 
phosphate buffer (PB). The brain was dissected and post-fixed overnight in 4% 
PFA in 0.1M PB at 4°C. Cryoprotection was carried out in 30% sucrose for 2 days, 
at which time tissue was sliced at 50 um on a Leica SM 2010R. Cryosections were 
rinsed in 0.1M phosphate buffered saline (PBS), blocked in blocking solution con- 
taining 1% BSA, 2% normal goat serum, and 0.3% triton X-100 in PBS for 1h, and 
then incubated in mouse anti-FlagM2 at 1:500 (Sigma cat# F1804) overnight at 
room temperature. After three washes in buffer, 10 min each, the sections were 
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incubated in Alexa goat anti-mouse 488 at 1:500 (Thermo Fisher, cat# A32723) for 
2 hat room temperature. After incubation, sections were washed twice in 0.1M PBS 
for 10 min each, followed by one wash in 0.1M PB. Sections were then mounted on 
Superfrost Plus slides (VWR, West Chester, PA) and coverslipped with Slowfade 
Gold (Thermo Fisher cat# $36936). Control slides were treated without the pri- 
mary antibody. These control sections showed no labelling. To test the specificity 
of secondary antibodies, the secondary antibody was applied to the tissue without 
a primary antibody; no staining was observed in these controls. 

Analysis. Calcium imaging. Imaging data were excluded from analysis if motion 
along the z-axis was detected. Dendrite images were corrected for in-plane motion 
via a 2D cross-correlation based approach in MATLAB. Axon images were cor- 
rected for in-plane motion using a piecewise non-rigid motion correction algo- 
rithm*’. ROIs were drawn in Image]; dendritic ROIs spanned contiguous dendritic 
segments and bouton/spine ROIs were circular. Mean pixel values for single ROIs 
were computed over the imaging time series and imported into MATLAB via 
MIJ*°. AF/Fo was computed by defining Fo using a 60 s percentile filter (typically 
10th percentile), which was then low-pass filtered at 0.01 Hz. Bouton and somatic 
responses were computed as the average response to the visual stimulus and were 
included for analysis of direction selectivity if AF/Fy exceeded 10% and 1-CV was 
>0.1. AF/F traces were median filtered with a 3-sample window. For spine signals, 
we subtracted a scaled version of the dendritic signal to remove backpropagating 
action potentials as previously performed!*!3. AF/Fy traces were synchronized to 
stimulus triggers sent from Psychopy and collected by Spike2. Spines were included 
for analysis if the average response exceeded 2 median absolute deviations above 
the baseline noise (measured during the blank) and were weakly correlated with 
the dendritic signal (Spearman's correlation, r < 0.3). Some spine traces contained 
negative events after subtraction, so we excluded negative AF/F values when com- 
puting Spearman's correlation between the spine and the dendrite. Because the 
amplitude of NMDA receptor mediated calcium transients are not necessarily cor- 
related with EPSP amplitude at the soma*!, we normalized each spine’s responses so 
that each spine had equal weight when computing summed spine inputs. Summed 
spine inputs were computed as the average spine response to each stimulus, boot- 
strapped 100 times. We also compared tuning of populations of spine inputs to 
somatic output by including response amplitude in the calculation and by com- 
puting the fraction of spines that preferentially respond to the preferred direction. 

DSI was computed as: 


Preferred—Null 
Preferred + Null 


Whole-cell recording. Membrane potential recordings were median filtered with 
a 30 to 100 sample window to remove action potentials and binned to 5 ms. 
Responses to individual stimulus cycles were extracted for V, and spikes separately. 
Mean (FO) and modulation amplitudes (F1 and F2) of each cycle were computed via 
Fast Fourier Transform (MATLAB). V, and spiking peak responses were computed 
as previously described’. Some cells exhibited V,, modulation at F2, so we also 
included the F2 component when computing Vm responses. For computing Vn 
noise, we aligned cycle responses across trials, then took the standard deviation for 
each time point. Vi, standard deviation was computed as the mean of this standard 
deviation value for each stimulus. 

Conductance measurements were made in current-clamp mode'"*, Multiple 
current steps depolarized or hyperpolarized the neuron close to the reversal 
potential for inhibition and excitation, respectively. Leak-subtracted synaptic 
conductances were computed by estimating Gjea, using the blank stimulus and 
then performing a linear fit of measured membrane potential responses at dif- 
ferent current injections. Mean and standard deviation of synaptic conductances 
were computed with a bootstrap (100 iterations). Cells were excluded from further 
analysis if negative conductances were extracted across multiple stimuli. To predict 
membrane potential responses from empirically measured synaptic conductances, 
we computed stimulus-dependent responses as previously described”: 
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in which Ry is —50 mV, V, is -50 mV, R, is 0 mV, R; is -70 mV, 0 is the direction 
of motion, gr is the measured leak conductance, and g.(9) and gi() are measured 
synaptic conductances. 

Optogenetic stimulation experiments compared visually evoked subthreshold 
responses under blue light stimulation with the responses obtained without blue 
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light stimulation. Modulation ratio was computed as the response amplitude with 
blue light on divided by the response amplitude with blue light off. 

For connectivity mapping, membrane potential traces were median filtered 
with a time window of 1.2 ms. We defined the prestimulus membrane potential 
as the membrane potential in the 9 ms before IPSP onset. Because of spontaneous 
activity in vivo, single trials were excluded if cells showed large depolarizations 
(>5 mV) relative to the prestimulus membrane potential. Significant IPSPs were 
defined as average IPSPs exceeding three standard deviations below the mean of 
the prestimulus membrane potential. We used the centroid of the significant IPSP 
field for distance measurements from single cells. Ellipse fits of the binarized sig- 
nificant IPSP array were computed using the MATLAB function ‘regionprops.m. 
Intrinsic signal imaging. Single-condition maps were computed by comparing 
whether reflectance changes evoked by a single-stimulus condition could be dis- 
criminated from reflectance changes evoked across all presented stimuli’. To 
discriminate a single-condition stimulus at each pixel, reflectance changes across 
all stimuli were combined into a normalized histogram, and then a pixel’s single 
condition response was computed non-parametrically as the probability of the area 
under a ROC curve (using the trapezoidal rule). Maps were filtered as previously 
described using a bandpass fermi filter”. Single bouton direction preferences were 
compared to the direction preference of the intrinsic signal direction preference 
map contained within the 100,1m two photon field of view. Somatic direction pref- 
erence was compared to the direction preference of the intrinsic map at the location 
of the cell. For STOMPM, stimulation grids were aligned to blood vessel reference 
maps for intrinsic signal imaging using an affine transform. We computed binary 
masks for each stimulation spot, and used these masks to measure intrinsic signal 
direction preference at single stimulation spots. 

Statistics. Sample sizes are similar to others used in the field. No statistical methods 
were used to predetermine sample size. Inclusion criteria for each experiment are 
detailed for methods. The experiments were not randomized. The experimenter 
was blind to location in the direction preference map when performing map-re- 
lated experiments; otherwise, the investigators were not blinded to allocation dur- 
ing experiments and outcome assessment. To test whether two distributions of 
direction preference were significantly different from random, we compared the 
median difference with a null distribution generated from Monte Carlo simulations 
(n=1,000). For each Monte Carlo simulation, we calculated the median difference 
between two randomly sampled distributions of direction preferences drawn from 
a uniform distribution ranging from 0° to 359° with sample sizes equivalent to the 
measured distributions. Statistical tests were non-parametric and two-sided, except 
for the Monte Carlo significance tests, which were one-sided. All correlations val- 
ues reported were computed using Spearman’s correlation. 

Code availability. Analyses were performed using MATLAB using standard func- 
tions. Custom code is available from the corresponding author upon reasonable 
request. 

Data availability. Source data are provided for graphical data representations in 
Figs. 1d-f, 2c-g, 3c-f, h-j, and 4f, k, m, n. Data are available from the correspond- 
ing author upon reasonable request. 
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Extended Data Fig. 1 | Summed spine inputs fail to predict somatic 
direction selectivity, regardless of the method used to compute the 
sum. a, No significant correlation between the DSI of summed spine 
inputs (with amplitude included) and somatic DSI. Spearman's r= —0.11, 
P=0.68, n= 17. b, No significant correlation between the fraction of 
spines that respond more strongly to the preferred direction and somatic 


DSI. Spearman's r= —0.082, P=0.75,n=17. 
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Extended Data Fig. 2 | Distribution of spiking DSI. Dashed line indicates cutoff of DSI > 0.3; n = 69 cells with spiking responses. 
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Extended Data Fig. 3 | Example of noise suppression at null stimulus relative to blank. Figure shows responses to preferred, null and blank. 
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Extended Data Fig. 4 | Direction tuning fits for excitatory and excitation and inhibition were not significantly different. FWHM 61° + 46° 
inhibitory conductances. a, Difference in direction preference of and 61° + 110° for excitation and inhibition, respectively; median + IQR, 
excitation and inhibition are significantly greater than chance; Monte n= 10, Wilcoxon sign-rank P= 0.70. c, Individual (grey) and population 
Carlo significance test, P= 0.023; difference in direction preference, average (coloured) tuning curves for G,, G; and predicted Vp, peak-aligned 
135°+ 95°, median +IQR, n= 10 cells from 7 animals. b, FWHM of to excitation. 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


Simulated Vm DSI 
° 
a 


[ofe) 


0 {e) 
0 0.5 1 1.5 2 
Gi/Ge at preferred 


Extended Data Fig. 5 | I/E ratio at preferred direction is not correlated 
with simulated subthreshold direction selectivity. Spearman's r= 0.0061, 
P=1,n=10 cells from 7 animals. 
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Extended Data Fig. 6 | Putative GABAergic neuron directly suppressed by blue light. Error bars, mean + s.e.m. 
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Extended Data Fig. 7 | Additional data related to blue light 
photoinhibition of GABAergic neurons. a, Optogenetic suppression of 
GABAergic neurons significantly reduces spiking direction selectivity; 
Wilcoxon sign-rank, n= 14 cells with spiking responses, P= 0.0049. 
Black line, mean; grey lines, single cells. b, Absolute V,, depolarization 
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induced by blue light is not related to optogenetic changes in V,, direction 
selectivity (computed as the difference in DSI between light off and light 
on conditions); Spearman’s r=0.11, P=0.70, n= 14 cells with spiking 
responses from 4 animals. 
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Extended Data Fig. 8 | Alignment of GABAergic neurons with intrinsic 
signal polar direction map. a, Underlying intrinsic signal polar direction 
map with direction-tuned GABAergic neurons overlaid. b, Direction 
preferences of inhibitory neurons and intrinsic signal direction preference 
map are significantly more similar than chance; P < 0.001, Monte Carlo 
significance test, n = 76 direction-selective neurons from 3 planes in 1 
animal. 
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Extended Data Fig. 9 | Reversal potential of optogenetically evoked PSPs is consistent with inhibition. Grey points are individual data points; black is 
mean + s.e.m. Data come from individual stimulation trials from one cell. 
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Extended Data Fig. 10 | Relationship of IPSP amplitude and distance. Grey points are individual data points; black is binned mean + s.e.m. Data come 
from trial-averaged stimulation responses from n= 21 cells from 7 animals. 
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Pancreatic islets communicate with lymphoid 
tissues via exocytosis of insulin peptides 


Xiaoxiao Wan!, Bernd H. Zinselmeyer!, Pavel N. Zakharov!, Anthony N. Vomund!, Ruth Taniguchi’, Laura Santambrogio’, 


Mark S. Anderson’, Cheryl F. Lichti! & Emil R. Unanue!* 


Tissue-specific autoimmunity occurs when selected antigens 
presented by susceptible alleles of the major histocompatibility 
complex are recognized by T cells. However, the reason why certain 
specific self-antigens dominate the response and are indispensable 
for triggering autoreactivity is unclear. Spontaneous presentation 
of insulin is essential for initiating autoimmune type 1 diabetes in 
non-obese diabetic mice’. A major set of pathogenic CD4 T cells 
specifically recognizes the 12-20 segment of the insulin B-chain 
(B:12-20), an epitope that is generated from direct presentation 
of insulin peptides by antigen-presenting cells**. These T cells 
do not respond to antigen-presenting cells that have taken up 
insulin that, after processing, leads to presentation of a different 
segment representing a one-residue shift, B:13-214. CD4 T cells 
that recognize B:12-20 escape negative selection in the thymus 
and cause diabetes, whereas those that recognize B:13-21 have 
only a minor role in autoimmunity*>. Although presentation of 
B:12-20 is evident in the islets**, insulin-specific germinal centres 
can be formed in various lymphoid tissues, suggesting that insulin 
presentation is widespread””*. Here we use live imaging to document 
the distribution of insulin recognition by CD4 T cells throughout 
various lymph nodes. Furthermore, we identify catabolized insulin 
peptide fragments containing defined pathogenic epitopes in 
B-cell granules from mice and humans. Upon glucose challenge, 
these fragments are released into the circulation and are recognized 
by CD4 T cells, leading to an activation state that results in 
transcriptional reprogramming and enhanced diabetogenicity. 
Therefore, a tissue such as pancreatic islets, by releasing catabolized 
products, imposes a constant threat to self-tolerance. These 
findings reveal a self-recognition pathway underlying a primary 
autoantigen and provide a foundation for assessing antigenic targets 
that precipitate pathogenic outcomes by systemically sensitizing 
lymphoid tissues. 

On the basis of previous studies demonstrating constrained T-cell 
migration during limited antigen recognition? "”, we investigated insulin 
presentation in peripheral lymph nodes by two-photon microscopy 
of lymph-node explants following transfer of insulin-specific T cells 
(Fig. la). These were transferred together with wild-type CD4 T cells 
as a control; each population of transferred cells was labelled with a 
different fluorescent probe (Fig. 1b). We monitored the individual 
trajectories of transplanted T cells within the same region of the lymph 
nodes and quantified their motility (Extended Data Fig. 1a). We first 
performed this assay with a control CD4 T cell (10E11), which recog- 
nizes hen egg lysozyme (HEL). These experiments confirm that limited 
antigen recognition that is insufficient to trigger cell division can be 
detected by a decrease in the mean velocity of T cells (Fig. 1c, Extended 
Data Fig. 1b, Supplementary Video 1). 

Widespread presentation of insulin peptides was demonstrated by 
reduced motility of the B:12-20-specific 8F10 T cells in the pancreatic 
(pLN), inguinal (iLN), mesenteric (mLN) and axillary (aLN) lymph 
nodes of non-obese diabetic (NOD) mice, relative to wild-type CD4 


T cells (Fig. 1d, Supplementary Video 2). Motility was reduced to a 
similar degree on day 1 or day 5 of imaging (Extended Data Fig. 1c), 
and was unaffected by switching the labelling of the fluorescent probes 
(Extended Data Fig. 1d). The diffuse, rather than clustered, pattern of 
motility arrest indicates that presentation of insulin peptides was limit- 
ing and was not restricted to selected antigen-presenting cells (APCs). 
Motility of 8F10 T cells was also reduced in pMT and Batf3~/" mice, 
which are deficient in B cells and XCR1* dendritic cells, respectively 
(Extended Data Fig. le). 

We performed three experiments to interrogate key parameters of 
antigen recognition by 8F10 T cells. First, we examined B16A mice, 
which are deficient in both Ins1 and Ins2 but express a proinsulin 
transgene with a Tyr16Ala substitution in the B chain’. This mutant 
insulin is bioactive but is not immunogenic to B:12-20- or B:13-21- 
specific T cells. There was no reduction in motility of 8F10 T cells in 
the B16A mouse recipients, demonstrating that the effects on T cell 
motility require specific epitope recognition by the 8F10 T cells (Fig. le, 
Supplementary Video 3). Second, we investigated whether prior recir- 
culation through the pLN was required for insulin recognition in other 
sites. Surgical removal of pLNs did not influence the motility arrest of 
8F10 T cells in the iLNs (Fig. 1f). Third, we detected motility arrest of 
8F10 T cells in diabetes-resistant B6 mice harbouring the I-A®’ haplo- 
type (B6g7) (Fig. 1g) but not in NOD mice with the H2b haplotype 
(Extended Data Fig. 1f). Therefore, peripheral insulin presentation to 
8F10 T cells requires I-A®’ and is not restricted to the NOD strain. 

The motility of 4F7 T cells, which specifically recognize the B:13-21 
epitope, was also markedly reduced in the pLNs and iLNs of NOD 
recipients (Extended Data Fig. 1g). By contrast, the 8.3 CD8 T cells, 
which recognize the islet-specific glucose-6-phosphatase-related pro- 
tein (IGRP’, a protein that is expressed in the endoplasmic reticulum 
of B-cells), exhibited reduced motility in the pLN but not in the iLN 
(Extended Data Fig. 1h). Therefore, epitopes of insulin, but not those 
from IGRP, a cell-associated antigen, are systemically available. 

We hypothesized that presentation of the low concentrations of 
circulating insulin (about 40 pM) might require insulin receptor-mediated 
uptake by APCs. To test this, we examined the effects of $961, an insulin 
receptor antagonist", In assays on cultured cells, $961 impaired the 
ability of concanavalin A (ConA)-activated macrophages to present 
insulin (Extended Data Fig. 1i). In vivo blockade of insulin receptor 
by infusion of mice with $961 via osmotic pump caused a sustained 
increase in blood glucose levels (Extended Data Fig. 1j), permitting 
two-photon microscopy analysis (Extended Data Fig. 1k). A significant 
reduction in mean velocity of transferred 4F7 T cells was observed in 
lymph nodes of control mice infused with phosphate-buffered saline 
(PBS) (Fig. 1h, Supplementary Video 4). Although the motility of 4F7 
T cells was also arrested after $961 infusion, the magnitude of the reduc- 
tion was significantly smaller than with PBS (Fig. 1h, Supplementary 
Video 4). Therefore, blockade of insulin receptor-mediated uptake of 
insulin partially abrogated recognition by the 4F7 T cells, suggesting 
that free insulin peptides are an additional source of the B:13-21 
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Fig. 1 | Peripheral insulin presentation 
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epitope. By contrast, the motility arrest of 8F10 T cells remained at 
a comparable level in recipients infused with PBS or S961 (Fig. 1i, 


Supplementary Video 5), indicating that the presence of B:12-20 is 
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independent of insulin receptor-mediated uptake of insulin. This 
epitope must therefore derive from insulin peptides that reach the 
peripheral lymphoid organs. Of note, APCs expressing autoimmune 
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Fig. 2 | Generation of insulin peptides in B-cell granules. 

a, b, Immunofluorescence of isolated islets stained for B:9-23 (a) or B:1-30 
(b), CD11c and insulin. Data are representative of 50 islets per group in 
three independent experiments. c, Immunogold electron microscopy 
showing antibodies against B:1-30 (large gold) and insulin (small gold) 

in a representative B-cell. d, e, A representative granule that contains both 
B:1-30 and insulin (d) or insulin only (e) is shown. The arrowhead in 

d indicates the B:1-30 peptide. Data are representative of 317 granules 
analysed in three independent experiments. f, Competitive ELISA showing 
quantification of insulin, B:1-30 and B:9-23 in granules isolated by 
centrifugation of islets from B6g7 mice at 5,000g (5k) or 25,000g (25k). 
Each line represents one paired experiment using 
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4-8 mice. *P < 0.05; **P < 0.01; two-tailed paired Student’s t-test. 

g, Peptide coverage of insulin B chain by sequences identified in 25k 
(red) and 5k (blue) B-cell granules using nLC-MS/MS analysis. Each line 
represents the alignment of individual peptides with insulin-2 B:1-30. 
Data are from four independent analyses using islets from 8-10 mice per 
strain. h, Box plot of log,(mass spectrometry peak area), showing the 
abundance of individual insulin B-chain peptides (purple) in the 25k and 
5k granules relative to all insulin peptides, including the C-peptides (box). 
Boxes with dashed outlines denote B:1-30 with a high abundance. Box 
plots show the median, box edges represent the first and third quartiles, 
and the whiskers extend to 1.5x interquartile range. 
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Fig. 3 | Secretion of insulin peptides into the circulation upon glucose 
stimulation. a—c, Insulin (a), B:1-30 (b) and B:9-23 (c) secreted from 
islets of B6g7 mice during one hour after stimulation with 2.5 mM or 

25 mM glucose, quantified by competitive ELISA. Each point represents an 
independent experiment. *P < 0.05; **P < 0.01; ***P < 0.005; two-tailed 
paired Student’s t-test. d, A 10 x 10 dot plot representing the coverage of 
insulin peptide sequences identified by nLC-MS/MS in supernatants of 
cultured islets stimulated with 25 mM glucose. Each dot represents 1% 
coverage of the total. e, Summary of selected insulin peptides containing 
defined immunogenic epitopes. The following epitopes are underlined: 
B:12-20 (red), B:13-21 (blue), B:15-23 (green) and A:14-20 (black). In 
B-C-spanning peptides, the residues of the B chain are shown in bold. 

f, Box plot of log»(peak area) showing abundance of individual B:9-23- 
associated peptides (blue), B—C-spanning peptides (red) and the A:14-20 
peptide (cyan) relative to all insulin peptides (box). Box plots show the 
median, box edges represent the first and third quartiles, and the whiskers 
extend to 1.5x interquartile range. g, The mass spectrum of a peptide 
sequence identified in mouse urine that contains all residues of the insulin 
B:9-23 peptide, with oxidation of cysteine to cysteic acid (c). 


regulator (AIRE)! are not a major source of insulin peptides (Extended 
Data Fig. 11). 

We identified insulin peptides in B-cell granules using peptide- 
specific monoclonal antibodies and mass spectrometry analysis. We 
used the monoclonal antibody AIP, which is specific for B:9-233, and 
generated a new monoclonal antibody (clone 6F3.B8) by immuniza- 
tion with the entire insulin B-chain (B:1-30). The two antibodies were 
not cross-reactive and neither recognized native insulin (Extended 
Data Fig. 2a—d). Notably, presentation of B:1-30 activated insulin- 
reactive T cells without the need for internal processing (Extended 
Data Fig. 2e). 

We previously identified B:9-23 in a set of LAMP 1-positive vesi- 
cles in 8-cells*. These vesicles are distinct from insulin-containing 
dense core granules and can be separated from them by differential 
centrifugation’. They are consistent with the crinophagic bodies 
that result from fusion of the dense core granules to lysosomes'*!8, 
and contain peptides that preferentially stimulate 8F10 T cells®. 
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Immunofluorescence with AIP showed a punctate pattern of B:9-23 
staining in B-cells from NOD.Rag!~’~ or B6g7.Ragl ‘~ mice (Fig. 2a). 
By contrast, B:1-30 staining using 6F3.B8 was more diffuse in nearly 
all the B-cells and co-stained with insulin (Fig. 2b). Using double 
immunogold-labelling antibodies, we detected B:1-30 in granules 
containing insulin (Fig. 2c). Many granules (106 out of 317, 33%) 
contained both B:1-30 and insulin (Fig. 2d), and the rest contained only 
insulin (Fig. 2e). AIP did not stain islets satisfactorily after labelling 
with immunogold. 

Regular secretory granules obtained by centrifugation at 25,000g 
(25k) contained significantly higher amounts of insulin than the frac- 
tion obtained at 5,000g (5k), which includes the crinophagic bodies 
(Fig. 2f). B:1-30 was primarily found in the 25k fraction and not in 
the 5k fraction, but in concentrations about one tenth that of insulin 
(Fig. 2f). By contrast, B:9-23 was significantly more abundant in the 
granules in the 5k fraction (Fig. 2f). 

We analysed the peptidomes of granules prepared from B6g7 
mice, B6 mice or 3-week old female NOD mice, by nanoflow liquid 
chromatography-tandem mass spectrometry (nLC-MS/MS). In 
all three strains, granules from the 25k fraction mostly contained 
sequences from the insulin C-peptide, the intact B:1-30, and a few 
small peptides from the B chain (Fig. 2g, h, Supplementary Table 1, 
Extended Data Fig. 3a). By contrast, the granules from the 5k fraction, 
contained more diverse short sequences from throughout the B chain 
(Fig. 2g, h, Supplementary Table 1). Peptides derived from the 9-23 
region, such as B:9-23 and B:11-23 (Extended Data Fig. 3b), were 
identified exclusively in the 5k granules of all three mouse strains. 
Manual interrogation of unassigned spectra only identified two 
putative hybrid peptides in the 5k granules (Extended Data Fig. 3c), 
a C-peptide-islet amyloid polypeptide (IAPP) fusion, and a fusion of 
the N terminus of the C-peptide of insulin-2 and the C terminus of the 
C-peptide of insulin-1. Peptides from other proteins were present at 
much lower levels in comparison to those from insulin. 

Examination of 8-cell granules from human islets revealed a striking 
similarity in the segregation of peptides between 5k and 25k fractions 
to that from the mouse islets (Extended Data Fig. 4a, Supplementary 
Table 1). The human 25k granules contained the intact B chain anda 
limited number of short sequences. The 5k fraction contained many 
short peptides, including a sequence representing B:11-30 (Extended 
Data Fig. 4b), containing the HLA(DQ§8)-binding B:11-23 determi- 
nant, which is recognized by peripheral T cells in patients with type 
1 diabetes’”. 

Islets stimulated with 25 mM glucose secreted insulin (Fig. 3a) along 
with lower concentrations of peptides that were recognized by 6F3.B8 
(Fig. 3b) or AIP (Fig. 3c). Secretion of insulin or insulin peptides was 
not affected when glucose challenge was carried out in the presence of 
protease inhibitors (Extended Data Fig. 5a), indicating that the peptides 
were not generated extracellularly. 

We characterized insulin peptides secreted by 8-cells using nLC-MS/MS. 
Most of these peptides were derived from the C-peptide, along with 
B-chain-derived sequences related to the 9-23 region and spanning 
the B-chain-C-peptide (B-C) junction (Fig. 3d, Supplementary 
Table 2). Many of the peptides were identical to or contained path- 
ogenic epitopes that were identified using diabetogenic T cells as 
probes!*+0-*? (Fig. 3e, Extended Data Fig. 5b-e). The intact B chain 
contained identical sequences to peptides identified in the 25k gran- 
ules, whereas B:9-23 and B:11-23 were identical to peptides in the 5k 
granules (Fig. 3e, Supplementary Table 2). Synthetic versions of pep- 
tides associated with B:9-23 activated T cells specific for B:12-20 as 
well as those specific for B:13-21 (Extended Data Fig. 6). In general, 
these potentially immunogenic peptides were present at low relative 
abundance (Fig. 3f, Supplementary Table 2). 

We identified a form of B:9-23 containing cysteine oxidized to 
cysteic acid in mouse urine using antibody capture (Fig. 3g); this is 
a modification that can occur during sample preparation (Extended 
Data Fig. 7a). This finding indicates that B:9-23 is present in the cir- 
culation. Indeed, fluorochrome-labelled B:9-23 was rapidly displayed 
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Fig. 4 | Acquisition of an effector-like phenotype by 8F10 T cells 
during antigen recognition. a, Experimental design for b-f. b, Pearson's 
correlation matrix showing hierarchical clustering of RNA-seq of 8F10 

T cells sourced from NOD or B16A hosts. c, GSEA enrichment plots 
showing a significant correlation (determined by false discovery rate 
(FDR) Q< 0.05) of genes upregulated in the 8F10-NOD samples with 
four hallmark datasets associated with metabolism pathways. d, GSEA 
enrichment plots showing a significant correlation of genes upregulated 


by I-A®’-expressing APCs in spleen but not in thymus following intra- 
venous injection (Extended Data Fig. 7b, c). 

The widespread presentation of insulin peptides in lymphoid tis- 
sues influences the biology of T cells. We generated a bone marrow 
chimaera model in which we transferred a small number of bone mar- 
row stem cells from CD45.2 8F10 mice deficient in T-cell receptor 
alpha chain (8F107C®*~’-) into non-lethally irradiated NOD or B16A 
hosts (CD45.1) (Fig. 4a). This resulted in the development of a small 
number of 8F10 T cells (0.5-2%) among the endogenous CD4 T-cell 
repertoire (Extended Data Fig. 8a). We performed RNA sequencing 
analysis (RNA-seq) on isolated 8F10 T cells from iLNs of both hosts 
(Fig. 4a). 

Hierarchical clustering using Pearson's correlation revealed differ- 
ences between the transcriptomes of 8F10 T cells sourced from NOD 
(8F10-NOD) and B16A (8F10-B16A) hosts (Fig. 4b). Gene-set enrich- 
ment analysis (GSEA) showed significant correlations between tran- 
scripts that were upregulated in the 8F10-NOD T cells with biological 
pathways involving oxidative phosphorylation (OXPHOS), Myc targets, 
fatty acid metabolism, mTOR complex 1 (mTORC1) signalling and 
DNA repair (Extended Data Fig. 8b). The four most highly ranked gene 
sets (Fig. 4c) were associated with metabolic pathways, and involved 
transcripts encoding key kinases, intermediates and transcription fac- 
tors (Extended Data Fig. 8c) that have been shown to support T cell 
proliferation and functions”? 

This metabolic reprogramming in 8F10 T cells from NOD mice was 
associated with an effector-like phenotype (Fig. 4d). The gene sets that 
were upregulated in these cells are also highly expressed in CD4 T cells 
upon stimulation, in CD8 T cells at the peak of expansion in compar- 
ison to the contraction phase”, and in CD8 effectors in comparison 
to exhausted T cells”®. There is little overlap among these three sets of 
transcripts (Extended Data Fig. 9a, Fig. 4e). According to GSEA, nei- 
ther T cell set correlated with anergic CD4”’ or tolerant CD8 T cells”® 
(Extended Data Fig. 9b). 

Functional analysis at the six-week time point revealed a higher 
capacity of effector cytokine (TNF and IFN.) production (Extended 
Data Fig. 10a) and cell proliferation (Extended Data Fig. 10b) in 8F10- 
NOD T cells. Neither T cell set expressed molecules associated with 
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Days after transfer 


in the 8F10-NOD samples with three immunological signature datasets 
depicting T cell activation and effector function. e, A Venn diagram 
showing the number of overlapping genes among the three gene sets in d. 
f, Incidence of diabetes in NOD.Rag1~’~ recipients adoptively transferred 
with 8F10 T cells isolated from the iLNs of NOD or B16A mice six weeks 
after bone marrow transfer. **P < 0.005; log-rank test. Data represent 
cumulative results of three independent transfers. 


anergy and exhaustion*®?””? (Extended Data Fig. 10c). Similar results 
were obtained for T cells analysed nine weeks after the bone marrow 
transfer (Extended Data Fig. 10d-f). Of note, when the two sets of 
T cells were transferred into NOD.Rag!~/~ recipients, the onset of dia- 
betes was accelerated by the 8F10-NOD set (Fig. 4f). Therefore, 8F10 
T cells acquired an effector-like phenotype during peripheral anti- 
gen recognition, supported by transcriptional reprogramming and 
increased diabetogenicity. 

In summary, peptide exocytosis is a normal response of B-cells that 
represents a mechanism of communication with the lymphoid tis- 
sues. Similar mechanisms may apply to other endocrine organs that 
also contain crinophagic granules. Examining the released peptides 
may enable better-targeted identification of T cell responses; a set of 
responses that could be extensive, given the diversity of exocytosed 
moieties. Previous studies have shown that ablation of all lymph nodes 
eradicates the pathogenic T cell repertoire and abolishes diabetes*® 
emphasizing the importance of the entire lymphatic system in inter- 
actions with T cells. Finally, the biological outcomes described here for 
8F10 T cells may vary for other insulin-reactive T cells with divergent 
TCR affinities. Comprehensive understanding of these outcomes will 
require analysis of the entire insulin-reactive T cell pool at different 
stages of the disease. 
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METHODS 

Mice. NOD/ShiLt] (NOD), NOD.129S7(B6)-Rag1'™!™°™/J (NOD.Rag1 -), 
NOD.Cg-Tg(Ins2*Y16A)1EllIns1"™*Ins2'™!*/GseJ (NOD.B16A), NOD. 
Cg-Tg(TcraTcrbNY8.3)1Pesa/DvsJ (8.3), NOD.12982(B6)-Ighm'™!°8"/DoiJ (MT), 
NOD.C-(Ptpre-D1Mit262)/WehiJ (NOD.CD45.2), NOD.B10Sn-H2b/] (NOD.H2b) 
and B6.NOD-(D17Mit21-D17Mit10)/Lt] (B6g7) mice were originally obtained from 
the Jackson Laboratory. NOD.10E11 TCR transgenic mice (TCRa: TRAV5D-4/ 
TRAJ42; TCR8: TRBV13-3/ TRBD2/TRBJ2-7) were generated using a previous pro- 
tocol’. NOD.4E7 TCR transgenic mice, NOD. Aire~/~, and NOD.TCRa~/~ mice were 
generated by M.S.A. The 8F10 or 10E11 mice expressing the CD45.2 allotype were 
generated by intercrossing the original TCR transgenic line (CD45.1) with the NOD. 
CD45.2 mice, and the CD45.2.NOD.8F10 mice were further crossed with the NOD. 
TCRa~!~ mice to generate the CD45.2 8F10'C®"-/~ mice. B6.Rag1~/~ mice were 
used to intercross with B6g7 mice to generate B6g7.Rag1~/~ mice. All mice were bred 
and maintained under specific pathogen-free conditions in our animal facility. All 
experiments were approved by the Division of Comparative Medicine of Washington 
University School of Medicine in St. Louis (Accreditation number A3381-01). 
Human pancreatic islets. De-identified human primary islets isolated from 
deceased donors were obtained from Prodo Laboratories. Experiments were judged 
to be ‘not human subject research’ by Washington University Human Research 
Protection Office (IRB ID # 201801183; Federalwide Assurance #FWA00002284). 
In total, islets from three donors were used: donor 1 (female, 57 years, BMI 21.35), 
donor 2 (female, 49 years, BMI 33), donor 3 (male, 28 years, BMI 34.7). Purity of 
the islets was between 85 and 98%. Islets were cultured in CMRL medium sup- 
plemented with 10% FBS and 50% L-cell conditioned medium*’ for recovery. The 
granules were isolated from ~1500 islets after 1-3 days of culture. 

Antibodies. The following fluorescently conjugated antibodies were purchased 
from BioLegend: anti-B220 (RA3-6B2), anti-CD11c (N418), anti-CD4 (RM4-5), 
anti-CD45 (30-F11), anti-CD45.1 (A20), anti-CD45.2 (104), anti-CD8a (53-6.7), 
anti-F4/80 (BM8), anti-VB8.1/8.2 (KJ16-133.18), anti-CD44 (IM7), anti-CD62L 
(MEL-14), anti-CD25 (PC61.5) and anti- TNFa (MP6-XT22). Unconjugated or 
Alexa Fluor 647-labelled Rabbit anti-insulin monoclonal antibody(C27C9) was 
purchased from Cell Signaling Technology. Unconjugated mouse anti-insulin mono- 
clonal antibody (E11D7) was purchased from Millipore. Alexa Fluor 594 F(ab), 
donkey anti-mouse IgG and HRP-conjugated goat anti-mouse IgG (Fc7-specific) 
were purchased from Jackson ImmunoResearch. 

Flow cytometry and cell sorting. Flow cytometry analysis was done as previously 
described’. The samples were examined using a FACSCanto II (BD Biosciences) 
and the data were analysed using FlowJo software (Tree Star Software). CD4T 
T cells from iLNs were enriched using the CD4* T cell isolation kit (Miltenyi 
Biotech), the 8F10 T cells were sorted as CD45.2*CD45.1~ CD4*CD8~ B220- CD 
llc” using FACSAria II (BD Biosciences). 

CFSE and CMTMR labelling. For two-photon imaging, naive CD4 T cells were 
purified by two rounds of MACS negative selection using the naive CD4* T cell 
isolation kit (Miltenyi Biotech) to remove non-CD4 and CD44" T cells. The 
CD25* cells were further removed from the flow-through portion using the CD25 
microbead kit (Miltenyi Biotech). The naive 8.3 or wild-type CD8 T cells were 
purified similarly by using the naive CD8* T cell isolation kit. Flow cytometry anal- 
ysis confirmed that >95% of the cells were CD4+/CD8+CD25~ CD62L"CD44-. 
CFSE (carboxyfluorescein succinimidy] ester) or 5-(and-6)-(((4-chloromethyl) 
benzoyl)amino)tetramethylrhodamine (CMTMR) labelling was performed using 
the Vybrant CFDA s.e. Cell Tracer Kit and the CellTracker Orange CMTMR Dye 
(both from ThermoFisher Scientific), respectively. In brief, T cells (10’/ml in 
PBS) were incubated with 10 1M CFSE or 8 tM CMTMR for 25 min at 37°C with 
a gentle shake after 10 min. Under these conditions, T cells were labelled with 
satisfactory intensities without significant cell death. Ice-cold PBS was then added 
to quench the labelling. 

Adoptive transfer. CFSE- or CMTMR-labelled T cells were mixed 1:1 and were 
injected intravenously. For all the two-photon experiments, 2 x 10° T cells with 
either label were transferred. Varying T cell numbers in preliminary experiments 
determined that this amount resulted in a stable 0.5-0.8% reconstitution of the 
transferred T cells in the endogenous CD4 T cell pool, which was sufficient for 
two-photon imaging without causing obvious intraclonal competition. All the 
recipient mice were 3-4-week-old female mice unless otherwise mentioned. For 
experiments in Fig. 4f, FACS-sorted 8F10 T cells from pooled iLNs of 8-10 NOD 
or B16A mice were adoptively transferred intravenously into 4-6-week old NOD. 
Rag1~‘~ recipients (10° cells per mouse). 

Two-photon imaging. Lymph nodes were removed, attached to coverslips, placed 
in CO;-independent medium (Gibco) at room temperature and immediately 
imaged in a perfusion chamber to simulate blood flow (36.5°C DMEM; 95% O 
and 5% CO ). Two-photon microscopy images were collected using a customized 
Leica SP8 Two-Photon Microscope (Leica Microsystems) equipped with a 25 x 
and 0.95 numerical aperture water-immersion objective and a Mai Tai HP 
DeepSee Laser (Spectra-Physics) tuned to 840 nm. Fluorescence emission was 


guided directly to external hybrid photodetectors (Leica/Hamamatsu). For signal 
separation, we used three separate dichroic beam splitters without bandpass filters 
(Semrock): 484-nm edge BrightLine (FF484-FDi01), 495-nm edge BrightLine 
(FF495-Di03), and 560-nm edge BrightLine (FF560-Di01). The mirrors were 
arranged in dendritic fashion. Stacks were collected with 2.5 jum between images 
with 25-30 images per stack. 

For cell tracking, two or three regions of one lymph node were randomly 
selected and cropped. Cells were tracked manually in 3D volume using Imaris 
8.41 software (Bitplane). We tracked the first 11 time points from each track 
(representing 5 min and 10 velocities between the time points). Each dot represents 
the mean velocity out of the 10 that were tracked. We also calculated the mean- 
dering index and the motility coefficient for each track. Note that we chose tracks 
with the same length since the track length impacts these last two parameters. The 
meandering index and the motility coefficient data are not shown for space reasons 
however the results support the velocity data. The mean track velocities (j1m/min) 
were calculated for individual tracks as previously described!!. 

Surgical removal of pancreatic lymph nodes. NOD mice (3-week old) were anes- 
thetized with a 4% mixture of isoflurane in oxygen. The two pLNs were exposed by 
gently retracting the spleen, pancreas, stomach and intestines, and were grasped 
with blunt forceps. Using an ophthalmic cautery on low power, the blood ves- 
sels on either side of the pLNs were cauterized and the pLNs were removed. The 
sham surgery was performed with the same procedures except that the pLNs were 
exposed without removal. 

$961 administration. The $961 peptide (GSLDESFY DWFERQLGGGSGGSSLEEE 
WAQIQCEVWGRGCPSY) was synthesized by LifeTein, with an intrachain disul- 
phide bridge between Cys33 and Cys40 (underlined). $961 (20 nMol/week) or 
control PBS was filled into the Alzet osmotic pump (2001 model, Durect) and 
inserted subcutaneously in the back of anaesthetized mice through an incision 
between scapula. Blood glucose levels were monitored twice a day (Chemstrip 
2GP; Roche); mice with a level above 250 mg/dl for two consecutive measurements 
were considered diabetic. 

Competitive ELISA assay. 96-well ELISA plates were coated with human insulin 
solution (1 jg/well) or peptides B:1-30 or B:9-23 (2 1M), and were blocked with 
3% BSA overnight at 4°C. Soluble competitive inhibitors, including different 
synthetic peptides and biological samples, were pre-incubated with the E11D7 
(100 ng/ml), 6F3.B8 (20 ng/ml), or AIP (4 ng/ml) monoclonal antibodies for 30 min 
and the mixture was added to the plate-bound antigens for 1 h at room temper- 
ature. In the absence of soluble competitive inhibitors, these concentrations of 
the monoclonal antibodies resulted in about a 50% binding to the plate-bound 
antigens. HRP-conjugated goat anti-mouse IgG (1:10000) antibody was then added 
for 1 h; the responses were developed using the OptEIA TMB Substrate (BD). 
The data (A4s0 nm) were collected using an iMark Microplate Reader (Bio-Rad 
Laboratories). For quantifying the biological samples, each experiment was paired 
with a standard curve in which serially diluted amounts of soluble antigens were 
used to suppress the binding of their cognate monoclonal antibodies to the same 
antigen in the plate-bound form. The degree of inhibition by the biological samples 
was calculated relative to the blocking curve used by the specific antigen using an 
equation generated by linear regression of the standard curve. 
Immunofluorescence microscopy. Mouse islets were isolated as previously 
described’. The islets were blocked with normal goat serum, fixed with 4% methanol- 
free formaldehyde, permeabilized with 0.2% saponin (Sigma), and stained with AIP 
or 6F3.B8 (50 j.g/ml) for 45 min on ice. The samples were then stained with Alexa 
Fluor 594 F(ab), donkey anti-mouse IgG (30 j1g/ml), Alexa Fluor 647 Rabbit anti- 
insulin (20 jxg/ml), and Alexa Fluor 488 anti-mouse CD11c (40 j1g/ml) for 45 min 
on ice and mounted using the Prolong Diamond mountant (ThermoFisher). The 
samples were viewed using the Eclipse E800 microscope (Nikon) equipped with 
the EXi Blue fluorescence camera (Qimaging). 

Electron microscopy with Immunogold. Islets were fixed in 4% paraformaldehyde 
(Polysciences) in 100 mM PIPES, 0.5 mM MgCh, pH 7.2 for 1 h at 4°C. Samples 
were embedded in 10% gelatin and infiltrated overnight with 2.3 M sucrose/20% 
polyvinyl pyrrolidone in PIPES/MgCl, at 4°C. Ultrathin sections of 50 nm were 
incubated with a blocking solution supplemented with 5% FBS and 5% normal goat 
serum for 30 min and subsequently incubated with rabbit anti-insulin (C27C9) and 
mouse anti-B chain (6F3.B8) antibodies for 1 h at room temperature. Sections were 
subsequently incubated with goat anti-mouse IgG conjugated to 18 nm colloidal 
gold and goat anti-rabbit IgG antibody conjugated to 12 nm colloidal gold for 1 h. 
Sections were stained with uranyl acetate and lead citrate and viewed on a JEOL 
1200 EX transmission electron microscope (JEOL USA) equipped with an AMT 8- 
megapixel digital camera and AMT Image Capture Engine v.602 software (Advanced 
Microscopy Techniques). All labelling experiments were conducted in parallel with 
controls omitting the primary antibody. These controls were consistently negative. 
Insulin secretion assay. Islets were equilibrated in DMEM supplemented with 10% 
FBS and 5.5 mM glucose for 24 h in 24-well plates. The medium was then replaced 
with 300 j1l pre-warmed Krebs-Ringer-HEPES balance solution containing 0.2% 
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BSA with 2.5 mM or 25 mM glucose. After 1 h incubation, the culture supernatants 
were collected for the competitive ELISA assay or mass spectrometry analysis. 
B-cell granule isolation. Mouse and human islets were dispersed using non- 
enzymatic dispersion solution (Sigma). Cells were resuspended in PBS and lysed by 
passing them through a cell homogenizer (Isobiotec). The lysate was centrifuged 
twice for 10 min at 500g, 4°C to pellet cell debris. The supernatant was centrifuged 
for 10 min at 5,000g, 4°C. The 5,000g spin was repeated on the supernatant and the 
two pellets were combined. This fraction was highly enriched in peptide-containing 
vesicles compatible with the crinophagic bodies, and as such have been labelled. 
This fraction may also contain organelles other than the insulin-containing ones. 
The supernatant after the 5,000g spin was centrifuged for 30 min at 25,000g, 4°C 
to pellet secretory granules. This supernatant was discarded, and the 25000g pellet 
was suspended in 100 jl PBS. The microcentrifuge used for granule isolation was 
an Eppendorf 5417R (Eppendorf) with a FA45-24-11 fixed angle rotor. Fractions 
were frozen at —80 °C and thawed at 37 °C for five cycles to release the contents 
of granules. After freeze-thaw, complete protease inhibitor cocktail was added to 
the sample which was then concentrated by Speed Vac to <1001]. The sample was 
passed through C18 Ziptips (Pierce) and peptides then were eluted in 0.1% formic 
acid/95% acetonitrile and then dried with a SpeedVac. 

Sample preparation for mass spectrometry analysis. Biological samples were 
treated with 2.5% trifluoroacetic acid (TFA) to a final concentration of 0.36% (v/v), 
and the peptides were purified using the C18 Ziptips, eluted with 0.1% formic acid 
in 95% acetonitrile, and lyophilized. For peptide capture, TFA-adjusted mouse 
urine (12 ml) was cleaned up using C18 Sep Pak cartridges (Waters). The analytes 
retained by the cartridge sorbent were eluted with methanol, lyophilized, and 
reconstituted with 2 ml sterile PBS. The material was then incubated with a 1:1 
mixture of sepharose pre-conjugated with AIP or 6F3.B8 monoclonal antibodies 
(1 ml slurry total) for 72 h at 4°C with gentle rotation. The urine-sepharose 
mixture was poured into a Bio-Rad Econo column, and after extensive washing, 
the antibody-bound material was eluted with 10% acetic acid and lyophilized. 
Mass spectrometry. A Dionex UltiMate 1000 system (Thermo Scientific) was 
coupled to an Orbitrap Fusion Lumos (Thermo Scientific) through an EASY- 
Spray ion source (Thermo Scientific). Peptide samples were loaded (30 j1l/min, 
1 min) onto a trap column (100 jm x 2 cm, 5 zm Acclaim PepMap 100 C18, 
50°C), eluted (300 i1l/min) onto an EASY-Spray PepMap RSLC C18 column (2 jum, 
25cm x 75 jum ID, 50°C, Thermo Scientific) and separated with the following 
gradient, all % Buffer B (0.1% formic acid in ACN): 0-40 min, 2-22%; 40-50 min, 
22-35%; 50-60 min, 35-95%; 60-70 min, isocratic at 95%; 70-71 min, 95-2%, 
71-85 min, isocratic at 2%. Spray voltage was 1900 V, ion transfer tube tempera- 
ture was 275°C, and RF lens was 30%. Mass spectrometry scans were acquired in 
profile mode and MS/MS scans in centroid mode, for ions with charge states 2-7, 
with a cycle time of 3 s. For HCD, mass spectra were recorded from 375-1500 Da 
at 120K resolution (at m/z=200), and MS/MS was triggered above a threshold 
of 2.5 x 104, with quadrupole isolation (1.6 Da) at 30K resolution, and collision 
energy of 30%. Dynamic exclusion was used (35 s). For high SA EThcD, mass 
spectra were acquired from 350-1500 Da at 60K resolution, and MS/MS spectra 
were triggered for ions above a threshold of 5 x 10‘ with quadrupole isolation 
(0.7 Da) at 15K resolution. Fragmentation employed calibrated charge-dependent 
ETD, with SA (40%) applied in the HCD cell. Dynamic exclusion was used (60 
s). For low SA EThcD, mass spectra were recorded from 375-1500 Da at 120K 
resolution (at m/z= 200), and MS/MS spectra were acquired for ions above a 
minimum intensity threshold of 2.5 x 104 at 15K resolution. ETD reaction time 
was fixed at 100 ms, with SA (15%) applied in the HCD collision cell. 

Mass spectrometry data analysis. Data files were uploaded to PEAKS 8.0 
(Bioinformatics Solutions) for processing, de novo sequencing and database 
searching. Resulting sequences were searched against the UniProt Mouse Proteome 
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database (downloaded 8 June 2017; 25,144 entries) with mass error tolerances of 
20 ppm and 0.02 Da for parent and fragment, respectively, no enzyme specificity 
and no fixed or variable modifications. The Common Repository for Adventitious 
Proteins database (www.thegpm.org/crap/) was used to identify contaminant pro- 
teins. FDR estimation was enabled. Peptides were filtered for —10log P > 20, and 
proteins were filtered for —10log P > 30 and one unique peptide. For all exper- 
iments, this gave an FDR of <1% at the peptide-spectrum match level. Peptides 
matching to insulin-1 and insulin-2 were manually verified by visual inspection. 
For relative quantification, peak areas for all manually verified peptides were 
exported from PEAKS, normalized to the total ion current, and log) transformed. 
T cell stimulation and antigen presentation assay. In Extended Data Fig. 1i, 
ConA-stimulated peritoneal macrophages were treated with 0.2 or 1 {1M $961 for 
1 hat 37°C and were then cultured with the IIT-3 T cell hybridoma that recognizes 
the 13-21 peptide, in the presence of serially diluted insulin (19278; Sigma). In 
Extended Data Fig. 2e, C3g7 cells were treated with chloroquine for 2 h at 37°C, 
washed, pulsed with the antigens, and cultured with T cell hybridomas. After incu- 
bation for 18 h, the culture supernatants were assayed for IL-2 production. 

Bone marrow chimaera. The female donor 8F10.TCRa~~ (CD45.2) mice were 
injected intraperitoneally with fluorouracil (200 mg/kg), and bone marrow cells 
were isolated from the femur and tibia on day 5. The cells were adoptively trans- 
ferred into sublethally irradiated (600 rads) 3-week old female NOD or B16A 
hosts (104/mouse). 

RNA-seq analysis. Total RNA was isolated using the Ambion RNAqueous-Micro 
kit (Thermo Fisher Scientific). RNA-seq library preparation and sequencing was 
performed as previously described*. The differential expression analysis was done 
with the DESeq2 package (version 1.18.1). Multifactor analysis was used to account 
for donor effect. Specifically, paired 8F10-NOD and 8F10-B16A samples from one 
isolation (four pairs in total) were treated as one donor group. Gene set enrichment 
pathways analysis was done using the Broad Institute’s GSEA software and MSigDB 
Hallmark or C7 immunological signatures databases. The latter included datasets: 
GSE28726™4, GSE1000001_1577_200_UP*°, GSE9650°° and GSE32025”*. All heat 
maps are in log, scale. The gene expression matrix counts were adjusted for donor 
effect with Combat (sva package) only for heat maps and clustering. 

Statistics. Mice were age and gender matched. Among the mice with matched 
ages and genders, they were randomized and distributed equally into experimental 
groups. Power analysis was used to estimate the sample size in some experiments, 
as described in the Reporting Summary. The investigators were not blinded to 
allocation during experiments and outcome assessment. One-way ANOVA 
with Sidak's multiple comparisons test was used to determine significant differ- 
ences among multiple groups with unpaired biological replicates. The two-tailed 
unpaired Student's t-test was used to determine significant differences between 
two groups with unpaired biological replicates. The two-tailed paired Student's 
t-test was used to calculate P values of each pair of independent experiments. The 
log-rank test was used to determine the significant difference of diabetes incidence. 
Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. The RNA-seq data have been deposited in the Gene Expression 
Omnibus under accession number GSE114824. The mass spectrometry proteomics 
data have been deposited to the ProteomeXchange Consortium via the PRIDE 
partner repository with the dataset identifier PXD009919. 
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Extended Data Fig. 1 | Probing peripheral antigen presentation by two- 
photon imaging; the motility assay. a, Representative 3D reconstructions 
of two-photon z-stacks visualizing CFSE-labelled anti- HEL 10E11 TCR 
transgenic and CMTMR-labelled wild-type CD4 T cells in an iLN explant 
on day 3 post transfer. Individual T cells were tracked in the area bound by 
the dashed line. Right, magnified views of this region, showing movement 
of T cells over a 7.5-min time interval. Quantification was performed 

over a 5-min interval. Cyan and purple tracks represent 10E11 and wild- 
type T cells, respectively. Mice were injected with 10 jug HEL. b, NOD 
mice (CD45.1) were injected intraperitoneally with indicated amounts 

of HEL. Six hours after injection, naive CFSE-labelled 10E11 (CD45.2) 

T cells were transferred. On day 3, CFSE dilution of the transferred T cells 
(CD45.2+CD45.1-CD4*V8.1/8.2*) in the iLNs was measured by flow 
cytometry. Data are representative of two independent experiments. 

c, Mean track velocities of 8F10 and wild-type CD4 T cells in iLNs from 
NOD recipients on day 1 or day 5 post transfer. d, CFSE(8F10) plus 
CMTMR(WT) or CMTMR(8F10) plus CFSE(WT) T cells were separately 


transferred into two cohorts of NOD recipients, and their mean track 
velocities in iLNs on day 3 were compared by paired two-photon imaging 
analysis. e, Mean track velocities of 8F10 and wild-type CD4 T cells in 
NOD.uMT or NOD.Batf3~" recipients on day 3 post transfer. f, Mean 
track velocities of 8F10 and 10E11 T cells in NOD.H2b recipients 24 h post 
transfer. g, h, Mean track velocities of 4F7 and wild-type CD4 (g) or 8.3 
and wild-type CD8 (h) T cells in NOD recipients on day 3 post transfer. 

i, Response (mean + s.e.m.) of the B:13-21-specific IIT-3 T cells to 
ConA-activated peritoneal macrophages treated with or without S961 
before insulin pulse. j, Blood glucose levels (mean + s.e.m.) of 3-week old 
NOD mice infused with $961 or PBS via osmotic pumps. k, The scheme 
of the experiments in Fig. 1h, i. 1, Mean track velocities of 8F10 and 
wild-type CD4 T cells in iLNs of Aire’ recipients. Data summarize two 
(c, d, f, 1) or three (e, g, h) independent experiments. Each dot represents 
individual T cell tracks, and the bar denotes the mean. ns, not significant; 
** P < (0001; one-way ANOVA with Sidak's multiple comparisons test 
(c, d, g, h) or two-tailed unpaired Student's t-test (e, f, 1). 
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Extended Data Fig. 2 | Analysis of insulin peptide-specific monoclonal 
antibodies and presentation of the intact B-chain. a-c, Competitive 
ELISA responses showing the binding of: anti-insulin monoclonal 
antibody (E11D7) to plate-bound insulin (a) anti-B:9-23 monoclonal 
antibody (AIP) to plate-bound B:9-23 (b), and anti-B:1-30 monoclonal 
antibody (6F3.B8) to plate-bound B:1-30 (c) in the presence of serial 
dilutions of the indicated soluble antigens as a competitive inhibitor. 
Inhibition by a specific soluble antigen indicates the specificity of the 
monoclonal antibody to this antigen. d, Competitive ELISA responses 
showing the binding of 6F3.B8 to plate-bound B:1-30 in the presence of 
soluble unmodified B:1-30 or B:1-30 in which the two cysteines were 
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changed to serines (B:1-30 C to S). The results indicate the intrachain 
link formed by the cysteines does not influence the specificity of the 6F3. 
B8 monoclonal antibody. Data are means representing two independent 
experiments. e, Responses of the B:13-21-specific IIT-3 (left) or the 
B:12-20-specific 9B9 (right) T cell hybridoma to C3g7 APCs treated 
with or without 100 j.M chloroquine for 2 h and pulsed with indicated 
antigens after extensive washes. C3g7 cells are a B cell lymphoma line 
expressing I-A®’, and are used as APCs. The results of the effects of 
chloroquine indicate that reactivity to insulin, but not to B:9-23 or B:1-30 
require internal processing. Data are mean + s.e.m., representative of two 
independent experiments. 
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Extended Data Fig. 3 | nLC-MS/MS analysis of mouse (@-cell granules. 
a, Mass spectra of mouse insulin-1 B:1-30 with intramolecular disulphide 
bonds (left) and mouse insulin-2 B:1-30 with oxidized methionine in 
position 29 (right). b, Mass spectra of mouse insulin B:9-23 (left) and 
B:11-23 (right), which were exclusively identified in the 5k granules of 
B6g7, B6 and NOD mice. c, Mass spectra of two hybrid peptides identified 
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in the 5k granules. The sequence (EVEDTPVRSGSNPQM, left) represents 
a C-peptide (underlined)-islet amyloid polypeptide (IAPP) fusion, and 
the sequence (EVEDPQVAEVARQ, right) represents a fusion of the N 
terminus of insulin-2 C-peptide (underlined) with the C terminus of 
insulin-1 C-peptide. 
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Extended Data Fig. 4 | nLC-MS/MS analysis of human (-cell granules. 
a, Peptide coverage of insulin B chain identified in human 25k (red) and 5k 
(blue) 6-cell granules using nLC-MS/MS analysis. Shown is the alignment 
of individual peptides (each line) with the human insulin B:1-30 segment. 


Data summarizes results from four independent runs using human islets 
from three individual donors. b, A mass spectrum showing a sequence 
representing human insulin B:11-30 that was identified in the 5k granules. 
The cysteinylation in position 19 is indicated. 
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Extended Data Fig. 5 | Analysis of insulin peptides secreted from islets sequences are identical to those in Extended Data Fig. 3a, and B:9-23 and 
upon glucose challenge. a, Insulin secretion assay was performed as B:11-23 share identical sequences with those in Extended Data Fig. 3b. 
described in Fig. 3a—c, except that protease inhibitors were added during c, A mass spectrum of the secreted insulin B:15-23 MHC-I (K*)-binding 
the 25-mM glucose challenge. The supernatants were then collected for the _ peptide. d, A mass spectrum of the secreted insulin A:14-20 MHC-I 
competitive ELISA assay. Data are mean + s.e.m. from two independent (D>)-binding peptide. e, A mass spectrum showing a representative 
experiments. b, Mass spectra of four secreted peptides that contain the B-C-spanning peptide (B25-C23). 


B:12-20 and/or B:13-21 epitopes as listed in Fig. 3e. Secreted B:1-30 
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Extended Data Fig. 6 | T cell responses to B:9-23-associated peptides. Responses of three insulin-reactive T cell hybridomas to insulin peptides 
associated with the 9-23 region of the B chain as identified in Fig. 3e. The C3g7 cells were used as APCs. Data are mean + s.e.m. 
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Extended Data Fig. 7 | Characterization of circulating B:9-23 and its 
localization into lymphoid organs. a, Unmodified synthetic B:9-23 

(3 pmol) was spiked into 1 ml PBS, purified using C18 tips, lyophilized, 
and analysed by nLC-MS/MS. The data show the appearance of 
unmodified B:9-23 (left) together with oxidation of Cys19 to cysteic 
acid (right). b, c, Alexa Fluor 488-conjugated B:9-23 peptide (100 jg) 
was injected intravenously into 4-week old B6, B6g7 and NOD mice. An 
hour later, spleens and thymi were harvested, digested with liberase and 
DNase, and binding to splenic and thymic APCs was measured by flow 
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cytometry. b, Representative FACS plots showing the binding of B:9-23 to 
splenic XCRI1* and Sirpa* dendritic cell (DC) subsets and B cells (top). 
The bar graph summarizes cumulative results from individual mice (each 
point), pooled from three independent experiments. ns, not significant; 
** P< 0.05; ***P < 0.01; ****P < 0.005, two-tailed unpaired Student’s 
t-test. c, Representative FACS plots showing the binding of B:9-23 to 
thymic XCR1* and Sirpat DC subsets and to CD45" cells expressing 
MHCIL Data are mean + S.D from five individual mice per strain from 
two independent experiments. 
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Extended Data Fig. 8 | RNA-seq analysis of 8F10 T cells developed 
in NOD or B16A hosts. a, Representative FACS plots (top) showing 
the sorting strategy and recovery of 8F10 T cells from iLNs of NOD or 
B16A-recipient mice six weeks after adoptive transfer of bone marrow. 


The scatter plot (bottom) shows the percentage of recovered 8F10 T cells 
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among total CD4 T cells from four independent experiments. ns, not 
significant; two-tailed paired Student's t-test. b, Biological pathways that 
are significantly enriched in the 8F10-NOD versus 8F10-B16A samples 
using GSEA and Hallmark database. c, Heat maps of all enriched genes in 
individual metabolic pathways depicted in Fig. 4c. 
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Extended Data Fig. 9 | 8F10 T cells exhibit an effector phenotype, but 
no anergy or exhaustion phenotype, at the transcription level during 
peripheral antigen recognition. a, Heat maps showing all the enriched 
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enrichment plots performed on differentially expressed genes in 8F10 
T cells from the NOD-iLN versus B16A-iLN condition using datasets 
characterizing CD4 T cell anergy and CD8 T cell tolerance. 
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Extended Data Fig. 10 | Functional analysis of 8F10 T cells developed 

in NOD or B16A hosts. a-f, The bone marrow chimaera was constructed 
as in Fig. 4a, and T cells were examined after 6 (a—c) or 9 (d-f) weeks. 

a, b, d, e, Bulk CD4* T cells were purified from iLNs of individual NOD or 
B16A mice (three per group) by two rounds of MACS negative selection. 
To examine cytokine repertoire (a, d), half of the individual T cell samples 
were combined. The remainder were kept as individual samples, labelled 
with CFSE (1.5 11M), and used to measure cell proliferation (b, e). In either 
case, T cells were mixed with NOD.Rag1~”~ splenocytes (1:2 ratio) and 
stimulated with B:9-23 for 16 (a, d) or 72 (b, e) hours. a, Representative 
FACS plots showing intracellular cytokine staining of the 8F10 T cells from 
NOD.-iLN or B16A-iLN, after stimulation with B:9-23 for 16 h (brefeldin 
A was added for the last 4 h). Production of IL-4, IL-17A, IL-5 and IL-10 
was not detected. Data are representative of two independent experiments 


— 
CD73 ——> TIGIT ——> 


with 3 mice combined per experiment. b, Representative FACS plots (top) 
showing CFSE dilution of the 8F10 T cells stimulated by B:9-23 or the 
control HEL11-25 peptide for 72 h. The results of 6 individual mice from 
two independent experiments are summarized in the box plots (bottom). 
Box plots show the median, box edges represent the first and third 
quartiles, and the whiskers extend from the minimum to the maximum. 
**P < 0.01, two-tailed unpaired Student’s t-test. c, Representative FACS 
plots showing ex vivo surface staining of FR4 and CD73 as well as CD39 
and TIGIT on endogenous CD4t or 8F10 T cells in the iLNs of NOD 

or B16A mice. Data are representative of three mice analysed in two 
independent experiments. d-f, Experiments were performed in week 9 
following the procedures described in a-c. The data in d-f are from a 
single experiment. 
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53BP1-RIF1-shieldin counteracts DSB resection 
through CST- and Pola-dependent fill-in 


Zachary Mirman!*, Francisca Lottersberger!*, Hiroyuki Takai!, Tatsuya Kibe!, Yi Gong!, Kaori Takai!, Alessandro Bianchi!?, 


Michal Zimmermann!*, Daniel Durocher? & Titia de Lange!* 


In DNA repair, the resection of double-strand breaks dictates 
the choice between homology-directed repair—which requires a 
3’ overhang—and classical non-homologous end joining, which 
can join unresected ends?. BRCA1-mutant cancers show minimal 
resection of double-strand breaks, which renders them deficient in 
homology-directed repair and sensitive to inhibitors of poly(ADP- 
ribose) polymerase 1 (PARP1)?-°. When BRCAL is absent, the 
resection of double-strand breaks is thought to be prevented by 
53BP1, RIF1 and the REV7-SHLD1-SHLD2-SHLD3 (shieldin) 
complex, and loss of these factors diminishes sensitivity to 
PARP1 inhibitors*®*°. Here we address the mechanism by which 
53BP1-RIF1-shieldin regulates the generation of recombinogenic 
3’ overhangs. We report that CTC1-STN1-TENI (CST)!°, a complex 
similar to replication protein A that functions as an accessory factor 
of polymerase-c (Pola)-primase"!, is a downstream effector in the 
53BP1 pathway. CST interacts with shieldin and localizes with Pola 
to sites of DNA damage in a 53BP1- and shieldin-dependent manner. 
As with loss of 53BP1, RIF1 or shieldin, the depletion of CST leads 
to increased resection. In BRCA1-deficient cells, CST blocks RAD51 
loading and promotes the efficacy of PARP1 inhibitors. In addition, 
Pola inhibition diminishes the effect of PARP1 inhibitors. These 
data suggest that CST-Pola-mediated fill-in helps to control the 
repair of double-strand breaks by 53BP1, RIF1 and shieldin. 

This study was initiated to determine whether the control of 
5’ resection at double-strand breaks (DSBs) resembles the regulation of 
resection at telomeres. The formation of telomeric t-loops requires the 
generation of 3’ overhangs after DNA replication'”"!°. Newly replicated 
telomeres are resected by EXO1, which generates 3’ overhangs that 
are too long and require fill-in mediated by Pola-primase’® (Fig. 1a). 
Pola-primase is brought to telomeres by an interaction between CST 
(also known as a-accessory factor, AAF!!”) and POT1B in mouse 
shelterin'® (Fig. 1a). Here we test whether CST-Pola fill-in of 3’ over- 
hangs has a role in the regulation of DSB resection by 53BP1, RIF1 
and shieldin. 

To study the role of CST at sites of DNA damage we used telomeres 
that lack shelterin protection, which are a model system for DSB resec- 
tion®!>!8-0, Hyper-resection occurs upon Cre-mediated removal of 
TPP1 (and POTIA and POT1B) from telomeres of Tppl mouse 
embryo fibroblasts (MEFs). This hyper-resection is counteracted by 
53BP1 and RIF1°, which accumulate in response to ATR signalling 
at telomeres that lack POT1A (Fig. 1a). As does 53BP1, shieldin lim- 
ited hyper-resection at telomeres that lack TPP1: TPP1-deficient cells 
that lacked either REV7 or SHLD2 showed telomere hyper-resection 
(Fig. 1b-d, Extended Data Fig. lac). 

As CST is essential”, we used short hairpin RNAs (shRNAs) to 
explore the role of CST in telomere hyper-resection. Depletion of STN1 
or CTC] increased the telomeric overhang signal in cells that lack TPP1 
(Fig. 1b-d, Extended Data Figs. 1d, 2), and tests with Escherichia coli 
Exol confirmed that the signal derived from a 3’ overhang (Extended 
Data Fig. le, f). Knockdown of STN1 or CTC1 did not affect the 


resection at telomeres when TPP1 was deleted from REV7-deficient 
cells (Fig. 1lb-d, Extended Data Fig. 2). Furthermore, STN1 knockdown 
had no effect on telomere hyper-resection when either 53BP1 or RIF1 
were absent, or when cells contained a form of 53BP1 that does not 
recruit RIF1*? (Extended Data Fig. 3). These data suggest that CST 
acts ina 53BP1-, RIF1- and shieldin-dependent manner to limit the 
formation of single-stranded DNA at dysfunctional telomeres. 

To determine whether CST also counteracted resection at sites 
of ATM signalling, we used conditional deletion of Trf2 (Fig. le). 
Telomeres that lack TRF2 undergo fusion mediated by classical non- 
homologous end joining”**. In cells deficient in DNA ligase IV 
(LIG4), in which such telomere fusions are prevented”°, telomeres that 
lack TRF2 undergo 5’-end resection that is exacerbated by loss of 53BP1 
or RIF1®!° (Fig. 1e). Similarly, the 5’-end resection was increased by 
REV7 or SHLD2 deficiency (Fig. 1f-h, Extended Data Fig. 4). When 
STNI was depleted from cells that lack TRF2, resection at telomeres 
was significantly increased (Fig. 1f-h) and this effect was epistatic 
with REV7 (Fig. 1f-h). Thus, CST counteracts resection in a shieldin- 
dependent manner in the context of ATM signalling. 

We next determined whether CST localized to damaged telomeres 
in a 53BP1- and shieldin-dependent manner. Myc-tagged CTC1 was 
detectable at telomeres with functional shelterin, whereas in cells that 
are deficient in POT1B—which show extended telomeric 3’ over- 
hangs but no DNA damage signalling?” the localization of CTC1 at 
telomeres was minimal (Fig. 2a, b). When ATR was activated by dele- 
tion of Tpp1 (Fig. 2a, right), CTC1 was again detectable at telomeres 
(Fig. 2a, b) despite the absence of POT1B. The recruitment of CTC] to 
dysfunctional telomeres depended on ATR signalling, 53BP1 and shiel- 
din (Fig. 2b, c). Similarly, Cre-mediated deletion of the single human 
POT! protein from conditional POT1-knockout HT1080 cells”* led to 
telomeric accumulation of STN1 that required ATR kinase (Fig. 2d-f). 
Thus, CST localizes to damaged telomeres in a shieldin-dependent 
manner. 

Co-immunoprecipitation experiments showed that shieldin com- 
ponents could associate with CST (Fig. 2g, Extended Data Fig. 5a). In 
a yeast two-hybrid assay, CTC1 robustly interacted with SHLD1 and 
STN] interacted with SHLD3 (Fig. 2h, Extended Data Fig. 5b). Weaker 
interactions were detectable between TEN1 and SHLD3; STN1 and 
SHLD1, SHLD2 or REV7; and CTC1 and REV7. Thus, shieldin binds 
CST through multiple direct interactions. 

In human cells, STN1 co-localized with 53BP1 at DSBs induced 
by ionizing radiation (Fig. 3a, b) in a manner dependent on shieldin 
(Fig. 3b). Furthermore, STN1 was detectable at DSBs induced by FOKI 
in U2OS cells. This localization was diminished upon inhibition of 
ATM and ATR signalling, and required 53BP1 and shieldin (Fig. 3c-e, 
Extended Data Fig. 6a), which indicates that CST is recruited to sites 
of DNA damage by shieldin. 

Because CST is associated with Pola—primase, we examined the 
localization of Pola at DSBs. Because Pola forms numerous S-phase 
foci (Extended Data Fig. 6b), we examined cells arrested in G2 (Fig. 3f, 
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Fig. 1 | Shieldin and CST counteract resection at dysfunctional 
telomeres. a, Left, schematic showing POT1B-bound CST counteracting 
the resection of telomere ends. Right, depiction of telomeres that lack 
TPP1, POT1A and POT1B as a proxy for DSB resection. Telomeres that 
lack TPP1 undergo ATR-dependent hyper-resection that is repressed by 
53BP1.b, Immunoblots showing loss of REV7 and STN1 in the indicated 
Tpp WRev7** (Rev7 is also known as Mad212) MEFs and TppL’/Rev7~/— 
(CRISPR) clones treated with Cre (96 h) and/or Stn] shRNA (shStn1) as 
indicated. CHK1-P serves as a proxy for TPP1 deletion. c, Quantitative 
analysis of telomere end resection in the cells shown in b using in-gel 
hybridization to detect the 3’ overhang (top), followed by rehybridization 
to the denatured DNA in the same gel (bottom) to determine the ratio 

of single-stranded (ss) TTAGGG to total TTAGGG signal. Representative 
of four experiments. d, Quantification of resection detected in 


Extended Data Fig. 6c). In cells that expressed HA-tagged STN1, Pola 
co-localized with STN1 at DSBs induced by FOKI (Fig. 3f, Extended 
Data Fig. 6c). The localization of Pola to DSBs was diminished upon 
inhibition of ATM and ATR signalling, and required 53BP1 and shiel- 
din (Fig. 3f, Extended Data Fig. 6d), which demonstrates that Pola and 
CST require the same factors for their localization to DSBs. 

Depletion of STN1 increased the percentage of cells that contained 
replication protein A (RPA) foci after exposure to ionizing radiation 
(Fig. 3g, i), increased the signal intensity of the RPA foci (Fig. 3h) 
and increased the overall RPA signal intensity per nucleus (Extended 
Data Fig. 7). Furthermore, deletion of CTC1 from a human HCT116 
cell line”! led to an increase in the phosphorylation of RPA upon 
irradiation (Fig. 3j) and CST depletion increased phosphorylation 
of RPA in irradiated MEFs (Fig. 3k). In cells that lack BRCA1, deple- 
tion of CST also increased the percentage of cells with RAD51 foci 
induced by ionizing radiation (Fig. 31, m), which suggests that this 
depletion restores homology-directed repair. Conversely, on the basis 
of an assay for the fusion of telomeres that lack TRF2”%, it appears 
that the depletion of CST diminishes classical non-homologous end 
joining (Fig. 3n, o). 

BRCA\1-deficient cells become resistant to treatment with PARP1 
inhibitors (PARPi) when 53BP1, RIF1 or shieldin are absent>~’. 


c, determined from four independent experiments (different shades of 
grey) and showing mean and s.d. Three independent Rev7 knockout clones 
were used (distinct symbols). e, Telomeres that lack TRF2 as a model for 
resection upon ATM activation. f, Immunoblots showing Cre-mediated 
deletion of TRF2 from Trf2“Lig4~/~ (Trf2 is also known as Terf2) cells, 
CRISPR deletion of Rev7, shRNA-mediated reduction of STN1 and CHK2 
phosphorylation. Asterisk, non-specific band. g, h, Telomere end resection 
analysis on the cells in f, as in c, d. Mean and s.d. from four independent 
experiments using two clones of each genotype. Note that the order of 

the samples is different in h when compared to f, g. All data panels in 

the figure are representative of four experiments. Mean indicated with 
centre bars and s.d. with error bars. *P < 0.05, **P< 0.01, ***P< 0.001, 
**** P< (0001, NS, not significant, two-tailed Welch's t-test. 


Similarly, STN1 or CTC1 depletion from Brcal// MEFs reduced the 
lethality of PARPiin BRCA1-deficient cells (Fig. 4a, b, Extended Data 
Fig. 8a-f). By contrast, in Brcal”! subclones that lack 53BP1 or REV7, 
depletion of CTC1 or STN1 did not affect resistance to PARPi (Fig. 4c, 
Extended Data Fig. 8c-f). Furthermore, in BRCA1-deficient cells, CST 
depletion reduced the radial chromosomes induced by PARPi (Fig. 4d, e) 
and this effect was epistatic with 53BP1 and REV7 (Fig. 4e). These data 
are consistent with CST acting together with 53BP1 and shieldin to 
minimize formation of single-stranded DNA at DSBs. 

To examine the consequences of Pola inhibition in PARPi-treated 
BRCA1-deficient cells without confounding S-phase effects, cells were 
arrested in G2 before addition of Pola inhibitors (Fig. 4f). Cells that 
experienced Pola inhibition in G2 showed reduced formation of radial 
chromosomes (Fig. 4f, Extended Data Fig. 8g). BrdU incorporation 
experiments confirmed that the mitotic cells we collected had passed 
through S phase during treatment with PARPi (Extended Data Fig. 8h-)). 
The effect of Polo inhibition with 10 jum CD437 was not exacerbated 
by depletion of CST (Fig. 4f). Collectively, these data are consistent 
with CST and Pola acting to limit the formation of recombinogenic 3! 
overhangs at DSBs in BRCA1-deficient cells (Fig. 4g). 

Our data suggest a sophisticated mechanism by which 53BP1 and 
shieldin act together with CST and Pole to fill in resected DSBs. At 
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Human proteins 
Fig. 2 | 53BP1- and shieldin-dependent localization of CST to 
dysfunctional telomeres. a, Left, representative immunofluorescence and 
fluorescence in situ hybridization (IF-FISH) for 6x -Myc-tagged CTC1 
(red) at telomeres (false-coloured in green) in Tpp“ MEFs before and 
after Cre (96 h). Arrowheads, CTC1 at telomeres. Pot1b~’~ cells control for 
spurious telomere-CTC]1 co-localization. Right, the same nuclei showing 
\H2AX (red) at telomeres that lack TPP1. The YH2AX and CTC1 signals 
are both false-coloured in red. Arrows, telomeres with CTC1 and yH2AX. 
b, Quantification of the percentage of telomeres co-localizing with CTC1, 
detected as in a. Each dot represents one nucleus from the indicated Tpp i! 
cell lines with and without Cre and/or ATR inhibitor (ATRi). 53bp1 is also 
known as Trp53bp1. Mean and s.d. from three independent experiments. 

c, Asin b, but using Tpp 1” cells treated with a Shld2 single-guide RNA 
(sgRNA) (sgShld2) or a control sgRNA (sgCtrl). Mean and s.d. as in 

b. d, Immunoblots for POT1 deletion, ATR knockdown and 


telomeres, the POT1-TPP1 heterodimer recruits CST-Pola—primase 
to fill in part of the 3’ overhang formed after telomere end resection 
(Fig. 4g). We propose that at sites of DNA damage, shieldin recruits 
CST-Pola—primase for the purpose of filling in resected DSBs. In both 
settings, CST is tethered, allowing CST to engage single-stranded DNA 
despite its modest affinity for this substrate” and enabling regulation 
of the fill-in reaction through recruitment. Recent data have shown 
that 53BP1 represses mutagenic single-strand annealing, possibly by 
preventing excessive resection*”. Our findings regarding CST-Pola 
could explain this observation. At telomeres, partial fill-in by CST- 
Pola counteracts hyper-resection but leaves a 3’ overhang that can 
form a t-loop, a process similar to the initiation of homology-directed 
repair’, At DSBs, CST-Polo could similarly counteract hyper-resec- 
tion—and thus single-strand annealing—and generate a 3’ overhang 
that is sufficient for homology-directed repair. In BRCA1-deficient 
cells, this fill-in reaction, together with the persistence of CST-shiel- 
din at the DSBs, could block homology-directed repair and result in 
lethal mis-repair. 
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HA-tagged STN1 (6HA-STN1) in conditional POT1-knockout human 
HT 1080 cells. Asterisk, non-specific band. sh1 and sh2 denote two 
different ATR shRNAs (see Methods for details). e, IF-FISH showing 
telomeric DNA co-localizing with STN1 in cells as in d, treated with 

Cre (96 h) and ATR shRNAs. f, Quantification of STN1 localization at 
telomeres before and after POT1 deletion, with or without ATR shRNAs 
as ine. Mean and s.d. from three independent experiments. Each symbol 
represents one nucleus. g, Immunoprecipitation of human CST (each 
subunit Myc-tagged) with Flag-tagged human SHLD1 or REV7 co- 
expressed in 293T cells. h, Yeast two-hybrid assay for interaction between 
CST and shieldin subunits. All data panels in the figure are representative 
of three experiments. Mean indicated with centre bars and s.d. with error 
bars. GAD, Gal4 activation domain; GBD, Gal4 DNA-binding domain; 
Vec, empty vector. **P < 0.01, ***P < 0.001, ****P < 0.0001, two-tailed 
Welch's t-test. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0324-7. 
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Fig. 3 | CST localizes to DSBs and represses formation of single- 
stranded DNA. a, Immunofluorescence for 53BP1 and HA-STN1 in 
HT1080 cells treated with ionizing radiation. b, Quantification of 53BP1- 
STN1 co-localization as in a, in cells with the indicated sgRNAs. Mean and 
s.d. from three independent experiments, >15 nuclei per experiment for 
each experimental setting. c, Immunoblots for the indicated proteins in 
FOKI-LaclI U20S cells treated with the indicated sgRNAs. 53BP1 is also 
known as TRP53BP1. d, Immunofluorescence for mCherry-FOKI (red) 
and HA-STNI1 (green) in FOKI-LacI U2OS cells as in c. e, Examples of 
HA-STN1 co-localizing with FOKI foci in cells as in d, treated with ATM 
inhibitor (ATMi) and ATRi, or the indicated sgRNAs and quantification 
of STN1-FOKI co-localization. Mean and s.d. from three independent 
experiments with >80 induced nuclei analysed for each condition. 

f, As in e, but monitoring Pola at DSBs in G2-arrested cells that express 
HA-STN1. g, Immunoblot for STN1 knockdown in MEFs that express 
Myc-RPA32. sh1 and sh2 denote two different Stn] shRNAs (see Methods 
for details). h, Immunofluorescence for Myc-RPA32 after 10 Gy ionizing 
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radiation (6 h) in MEFs. i, Quantification of cells with RPA foci as in h, 

in >30 nuclei for each condition in four independent experiments (grey 
shading) with mean and s.d. j, k, immunoblots for ionizing radiation- 
induced RPA phosphorylation (pRPA32, phosphorylated at Ser4 and Ser8) 
after deletion of CTC1 from human cells (j) or after depletion of STN1, 
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Fig. 4 | CST and Polo affect the outcome of PARPi in BRCA1-deficient 
cells. a, Colonies detected in a PARPi survival assay using Brea I! MEFs 
with or without Cre and Ctc1 or Stn1 shRNAs. b, Graphical representation 
of data in a from three independent experiments. c, Epistasis analysis 

of PARPi resistance induced by the absence of 53BP1 or REV7, and 
depletion of CST subunits. Mean (symbol) and s.e.m. (error bars) from 
three independent experiments. d, PARPi-induced radial chromosomes 
in BRCA1-deficient cells. Scale bars, 1 jum. e, Mean (centre bar) and 

s.d. (error bars) of percentage of misrejoined (radial) chromosomes in 
>10 metaphases per experimental setting for each of three independent 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and investigators were not blinded to allocation during 
experiments and outcome assessment. 

Cell culture and expression constructs. Brea l// and Trf2//Lig4-/— MEFs were 
derived from Brcal mice*!, Trf2/! mice”® and Lig4*/~ mice*? by standard crosses. 
Mice were housed and cared for under Rockefeller University IACUC protocol 
16865-H at the Rockefeller University’s Comparative Bioscience Center, which 
provides animal care according to NIH guidelines. MEFs were isolated from 
embryonic-day 12.5 embryos and immortalized with pBabeSV40 large T antigen 
(a gift from G. Hannon) at early passage (P2/3), as previously described”®. 
Genotypes were determined by Transnetyx, using real time PCR with allele- 
specific probes. Tpp LS, Tpp L/53bp1~/~ (ref. *°), Top Rif LY or Tpp Rif L’* 
(ref. 2°) and Pot1b°#°/S!? (ref. 2?) MEFs have previously been described. MEFs 
and U20S cells were cultured in Dulbecco’s modified Eagle medium (DMEM, 
Corning) supplemented with 15% fetal bovine serum (FBS) (Gibco), non-essential 
amino acids (Gibco), 2 mM t-glutamine (Gibco), 100 U/ml penicillin, 100 jxg/ml 
streptomycin (Gibco), 50 11M 3-mercaptoethanol (Sigma). 293T, Phoenix and 
conditional POT1-knockout HT1080 clone c5*8 cells were cultured in DMEM 
supplemented with 10% bovine calf serum (BCS), non-essential amino acids, 
L-glutamine and penicillin-streptomycin as above. For most Cre-mediated gene 
deletion experiments (see exceptions below), retroviral infections with pMMP 
Hit & Run Cre were repeated three times”®. Time points of cell collection indicate 
hours after the second Cre infection. 

U20S cells containing a LacO array and a tamoxifen- and Shield1-regulated 
mCherry-FOKI-Lacl fusion were used as described**. Cells were collected 4h after 
induction of FOKI by addition of 0.1 tM Shield1 and 10 j.g/ml 4-OHT. Human 
CTCL HCT116 cells?! were cultured in McCoy’s 5A supplemented with 10% 
FCS, non-essential amino acids, L-glutamine and penicillin-streptomycin as above. 
CTCI gene deletion was induced with 0.5 j1.M 4-OHT for 5 h. Gene deletion was 
confirmed by western blot using anti-CTC1 antibody (MABE1103, Millipore). 

Mouse CTC] tagged at the N terminus with a 6 x Myc tag was delivered by 
retroviral transduction using pLPC or pWZL retroviral vectors. Human STN1 
tagged at the N terminus with a 6 x HA tag was delivered using the pLPC vector. 
Myc-tagged RPA32°», and 53BP1 wild type and 53BP1ARif1 constructs were as 
described. Retroviral gene delivery was performed as described!°. 

RNA depletion with shRNAs in pLKO.1 (Open Biosystems) was performed using 
the following shRNA target sites: shStn1 1: 5‘-GATCCTGTGTTTCTAGCCTTT-3' 
(TRCN0000180836, Sigma); shStn1 2: 5’/-GCTGTCATCAGCGTGAAAGAA-3! 
(TRCN0000184261, Sigma); shCtc1: 5'’-CGGCAGATCACAGCATGATAA-3’; 
shAtr 1: 5’-CTGTGGTTGTATCTGTTCAAT-3’ (TRCN00000396 13, Sigma); shAtr 
2: 5'-GATGAACACATGGGATATTTA-3’ (TRCN0000196538, Sigma). Lentiviral 
constructs were co-transfected with packaging vectors into 293T cells and cells 
infected with the viral supernatant were selected in puromycin as described'®. 

Drug treatments were as follows. ATR inhibition: 2.5 1M ETP-46464 (Sigma), 
24h; PARP1 inhibition: 0.1-10 j.M Olaparib (Selleck Chemicals), 24 h; G2 arrest: 9 
pM RO-3306 (Sigma), 12 h; polymerase a inhibition: 2.5 or 10 tM CD437 (Sigma) 
or 2 4M Aphidicolin, 4h. ATM and ATR inhibition: 10 1M KU55933 (Selleck 
Chemical) with ATR inhibition as above for 4 h during induction of FOKI nuclease. 
CRISPR-Cas9 gene disruption. Clonal cell lines with disruption of mouse 
53BP1 were generated using Cas9 vector (Addgene) and sgRNA (sg53bp1 (2), 
5'-GAGAATCTTCTATTATC-(PAM)-3’; sg53bp1 (3), 5’- GCATCTGCA 
GATTAGGA-(PAM)-3”°) delivered by nucleofection (Amaxa Kit R, Lonza). 
Clones were screened by immunoblotting and bi-allelic gene disruption was 
verified by Sanger sequencing of Topo-cloned PCR products of the relevant 
locus (sequences available on request). Clonal cell lines with mouse Rev7 gene 
disruption were isolated similarly using the following sgRNAs: sgRev7(2), 
5’-GTGTCCCCACCACAGTGG-(PAM)-3’; and sgRev7(3), 5/-GCCGGTTC 
AGGTGAGCCC-(PAM)-3’) (disrupted gene sequences available on request). 
Oligonucleotides were purchased from Sigma-Aldrich and cloned into the AflII- 
digested gRNA expression vector (Addgene) by Gibson Assembly (New England 
Biolabs). For isolation of populations with CRISPR-Cas9 disruption of mouse 
Shld2 (FAM35A), 293T cells were transfected with lentiCrispr-v2-Shld2-sgRNA 
(5'- ATCAGTCAGATCCCTGCGTT-(PAM)-3’) or the vector control. The lentiviral 
supernatant was used for infection of Tpp 1” or Trf2”/Lig4~/— MEFs; infections were 
done six times at 6-12 h intervals. Infected cells were then selected in puromycin 
for 3-5 days before Cre infection. 

FOKI-Lacl U20S cells were infected with the 6x -HA-tagged human STN1 
retrovirus and selected in puromycin. Subsequently, cells were subjected to 
lentiviral infection with lentiCrispr-v2 carrying sgRNA for human 53BP1, SHLD2 
or REV7 and selected in blasticidin for 3 days. Target sequences for gene disrup- 
tion are as follows: human 53BP1 sgRNA1 (5'‘-CAGAATCATCCTCTAGAACC- 
(PAM)-3’), 53BP1 sgRNA2 (5'-TTGATCTCACTTGTGATTCG-(PAM)-3’), 
SHLD2 sgRNA1 (5‘-TCTGGAGAACCAATAGATTC-(PAM)-3’), SHLD2 
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sgRNA2 (5'-TTTGAGCTAAAAAAGCAACC-(PAM)-3’), REV7 sgRNAI (5/-CCT 
CAACTTTGGCCAAGGTA-(PAM)-3’), REV7 sgRNA2 (5/-TATACTGATTC 
AGCTCCGGG-(PAM)-3’). For each gene the two sgRNAs were either used 
individually or together. 

In-gel analysis of single-stranded telomeric DNA. Mouse telomeric overhang 
and telomeric restriction fragment patterns were analysed 96-120 h after Cre treat- 
ment by in-gel hybridization with a j-*7P-ATP end-labelled (AACCCT), probe, as 
previously described”®. Treatment with E. coli exonuclease I before Mbol digestion 
was used to verify the 3’ terminal position of the single-stranded DNA as previ- 
ously described”’. ImageQuant software was used to quantify the single-stranded 
telomere overhang signals and the signal from total telomeric DNA in the same 
lane in the denatured gel. In each experiment, this ratio was set to 1 for lanes not 
treated with Cre or shRNA and the ratios for the treated samples are given relative 
to this control. 

Flow cytometry. FACS was performed as previously described** with gating. 
Immunoblotting. Immunoblotting was performed as described!® with the 
following antibodies: 53BP1 (175933, Abcam; NB100-304, Novus Biological); 
ATR (sc-1887, Santa Cruz Biotechnology); BRCA1 (MAB22101, R+D systems); 
CHK (sc-8408, Santa Cruz Biotechnology); CHK1-S345-P (#23418; Cell Signaling 
Technology); CHK2 (BD 611570, BD Biosciences); Flag-tag (M2, Sigma; F1804, 
Sigma); \-tubulin (GTU488, Sigma); MAD2L2/REV7 (ab180579, Abcam); Myc-tag 
(9B11, Cell Signaling Technology); OBFC1/STN1 (E10-376450, Santa Cruz 
Biotechnology); Tagged TEN1 was not detectable by immunoblotting of 
transfected 293T cells. 

For detection of RPA phosphorylation, conditional CTC] HCT116 cells or MEFs 
were irradiated and collected 3 h later. Cells were washed in PBS, and then collected 
by scraping in Laemmli sample buffer, boiling for 5 min and shearing through a 
syringe. Proteins were separated by SDS-PAGE on 8-16% Tris-glycine gradient 
gels (Invitrogen), and transferred to nitrocellulose overnight. Immunoblotting for 
pRPA followed standard protocols with blocking in 5% milk/TBST and the pRPA 
antibody (S4/S8; Bethyl) diluted 1:1,000 in 1% milk/TBST. 
Immunoprecipitation. Immunoprecipitation was carried out as described’. 
The following plasmids were used: pLPC-flag-POT la and pLPC-flag-POT1b”; 
pLPC-myc-mouse Ctcl, pLPC-myc-mouse Stn1 and pLPC-myc-mouse Ten1!°; 
pLPC-myc-human Ctcl, pLPC-myc-human Stn1, pLPC-myc-human Ten1, pCD- 
NA5-flag-human Shld1 (C20orf196) and pLPC-myc-hRev7, pLPC-flag-mouse 
Shld1 (orthologue of C20orf196). Human REV7, SHLD1, CTC1, STN1 and TEN1 
ORFs were generated by PCR and mouse Shld1 was generated by RT-PCR. 
Co-transfection of Shld1 and Rev7 with CST in 293T cells was performed 
using calcium phosphate co-precipitation. Lysates were prepared in lysis buffer 
containing 50 mM Tris-HCl (pH 7.4), 150 mM NaCl, 10% glycerol and 0.1% 
NP-40, Complete protease inhibitor mix (Roche), and PhosSTOP phosphatase 
inhibitor mix (Roche) and 50 U benzonase. 

Yeast two-hybrid assays. For yeast two-hybrid analysis, full-length versions 
of human CST and shieldin components were cloned into the Ndel site of the 
pGBKT7 and pGADT7 vectors (Clontech). Plasmids in the indicated pairwise 
combinations were co-transformed into budding yeast strain PJ69-4A (MATa 
trp1-901 leu2-3,112 ura3-52 his3-200 gal4 gal80 LYS2::GAL1-HIS3 GAL2-ADE2 
met2::GAL7-lacZ) and selected on synthetic complete drop-out medium lacking 
tryptophan and leucine. Protein interactions were tested by plating on the same 
medium but also lacking adenine. 

Immunofluorescence and IF-FISH. Previously published procedures were 
followed for immunofluorescence and IF-FISH'*. Immunofluorescence for Myc- 
tagged RPA32 or CTC1 (mouse monoclonal, 9B11 or rabbit monoclonal, 71D 10, 
Cell Signaling Technology), HA-tagged STN1 (3724, Cell Signaling Technology), 
endogenous Pola (sc-137021, Santa Cruz), and 53BP1 (612522, BD Biosciences) 
was carried out using the cytoskeleton extraction protocol*”. Intensity measure- 
ments of RPA32—Myc immunofluorescence were performed in FIJI as follows: 
nuclei were identified using thresholding, segmented and identified as regions of 
interest. The average image background was then subtracted from the image, and 
the total raw pixel intensity within each area of interest in the channel of interest 
was calculated. RAD51 (70-001, Bioacademia), and yH2AX (05636, Millipore) 
were detected in cells fixed in 3% PFA, and foci showing co-localization of RAD51 
with ~H2AX were quantified. Immunofluorescence imaging was performed on 
a Zeiss Axioplan II microscope equipped with a Hamamatsu C4742-95 camera 
using Volocity software or on a DeltaVision (Applied Precision) equipped with 
a cooled charge-coupled device camera (DV Elite CMOS Camera), a PlanApo 
60x 1.42 NA objective or 100x 1.40 NA objective (Olympus America), and 
SoftWoRx software. 

Telomere fusion assays. SV40LT-immortalized Trf2/Rosa® cells were infected 
with Stn] shRNA (or the empty vector) and 24 h later Cre was induced for 24h 
with 4-OHT. Cells were collected, counted (to rule out a proliferation defect) and 
processed for telomeric FISH on metaphases 72 h after Cre induction. This early 
time point was selected to avoid any effect of the Stn1 shRNA on proliferation, as 
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diminished proliferation reduces fusion frequencies. Telomere fusions were scored 
as previously described”®. 

Survival assays and chromosome analysis. PARPi survival assays and analysis 
of misrejoined chromosomes were carried out as described”, except that for 
analysis of radial chromosomes, MEFs were incubated with 0.5 |.M Olaparib 
(AZD2281) for 24 h before collection. For the survival assays, MEFs were seeded 
in 6-well plates in duplicate at 10, 50, 100, 500, 1,000, 5,000 or 10,000 cells per 
well. After 24 h, cells were treated with Olaparib at the indicated concentrations 
for 24 h. Cells were then provided with medium without Olaparib and incubated 
for one week with a medium change at day 4. Colonies were fixed and stained 
with 50% methanol, 2% methylene blue, rinsed with water and dried before 
counting. The survival percentage at each PARPi concentration compared to 
untreated cells was calculated using wells with 10-100 colonies. Two technical 
replicates at two cell concentrations were scored for each condition in three 
independent experiments. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 


Data availability. All data that were generated and/or analysed in this study are 
included in the published paper and its Supplementary Information. 
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Extended Data Fig. 1 | Shieldin and CST counteract telomere hyper- 
resection. a-c, Effect of SHLD2 on hyper-resection at telomeres that 

lack TPP1. a, Immunoblot for CHK1-P, an indicator of TPP1 deletion, in 
Tpp L“! MEFs with and without bulk population treatment with a Shld2 
sgRNA and/or Cre (representative of three experiments). b, Quantitative 
analysis of telomere end resection as in Fig. 1c using the cells shown in a. 

c, Quantification of the extent of resection detected in b, as in Fig. 1d. Mean 
(centre bars) and s.d. (error bars) from four independent experiments. 

*P < 0.05, **P < 0.01, two-tailed Welch's t-test. d, Fluorescence-activated 


cell sorting (FACS) profiles of the indicated cells incubated with BrdU 
to measure S phase effects of the Stn1 shRNA. Gating strategy for live 
cells and singlets is shown below the FACS profiles. Representative of 
two experiments. e, f, Experiments to verify that the single-stranded 
DNA signal derives from a 3’ overhang. e, Immunoblot for STN1 and 
-\-tubulin in Tpp 1 (Rift) cells treated with Stn1 shRNA and/or Cre. 
Representative of two experiments. f, Quantitative assay for telomeric 
overhangs, as in Fig. 1c. Plugs in the Exol lanes were treated with the 
3’ exonuclease from E. coli. Representative of two experiments. 
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Extended Data Fig. 2 | Hyper-resection at telomeres that lack TPP1 is 
counteracted by CST and shieldin. a, Inmunoblots showing absence 

of REV7 and reduction of STN1 expression in the indicated Tpp Lf 

and Tpp l’/Rev7~/~ MEFs treated with either Ctc1 or Stn1 shRNA. 
Diminished STN1 expression is used as a proxy for the efficacy of the 
Ctc1 shRNA. Representative of two experiments. b, Quantitative analysis 
of telomeric overhangs, as in Fig. 1c. Representative of two experiments. 
c, Quantification of the effect of Ctcl and Stn1 shRNA on resection 

at telomeres that lack TPP1, as in Fig. 1d. Data are obtained from two 
independent REV7-proficient and two independent REV7-deficient clones 
(light and dark shading). 
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Extended Data Fig. 3 | No effect of CST depletion on telomere hyper- 
resection when 53BP1 or RIF1 are absent. a, SV40LT-immortalized 
Tpp 153bp1~/~ cells were complemented with wild-type 53BP1 or a 
mutant 53BP1 that lacks the ability to interact with RIF1, treated with 

a Stn1 shRNA as indicated and analysed by immunoblotting for 53BP1 
and STN1. Representative of four experiments. b, Quantitative analysis 
of telomeric overhangs, as in Fig. 1c. c, Quantification of the resection at 


normalized overhang signal 
(ss[TTAGGG],/total[TTAGGG],) 


0 es 
shStn1- + - + - + - + 

Cre_ - 
TPP1F/FRIf1F/* TPP1F/FRIf1F/F 


+ - + 


UDOOV_LLJle}0} - paunyeueq 


1.0 0.0 1.3 0.0 11.9 0.3 11.9 0.3 
Relative ss TTAGGG signal 
normalized to total TTAGGG 


analysis of telomeric overhangs, as in Fig. 1c. f, Quantification of the 
extent of resection detected, as in e, determined from three independent 
experiments (indicated by different shades of grey) showing mean (centre 
bars) and s.d. (error bars). Each experiment involved all indicated samples 
analysed in parallel. g, h, Experiments to verify that the single-stranded 
DNA signal derives from a 3’ overhang. g, Immunoblot for STN1 and 
--tubulin in Tpp !”/RifL” cells treated with Stn] shRNA and/or Cre. 


telomeres that lack TPP1, in four independent experiments performed as 
in Fig. 1d. d, Immunoblots showing loss of RIF1 and STN1 in the indicated 
Tpp VIRif + and Tpp Rif MEFs treated with Cre (96 h) as indicated, 
and with or without Stn1 shRNA. Note the diminished levels of RIF1 after 
Cre, owing to heterozygosity in the Tpp /”Rift’* cells. e, Quantitative 


Representative of two experiments. h, Quantitative assay for telomeric 
overhangs, as in Fig. 1c. Plugs in the Exol lanes were treated with the 3’ 
exonuclease from E. coli. Representative of two experiments. *P < 0.05, 
**P < 0.01, ***P< 0.001, ****P < 0.0001, two-tailed Welch's t-test. 
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Extended Data Fig. 4 | SHLD2 counteracts resection at telomeres that 
lack TRF2. a, Immunoblots for TRF2 deletion and CHK2 phosphorylation 
in Trf2/Lig4-/— MEFs, with and without bulk population treatment with 

a Shld2 sgRNA and/or Cre. Asterisk, non-specific band. Representative of 
three experiments. b, Quantitative analysis of telomere end resection, as in 
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Fig. 1c, using the cells shown in a. c, Quantification of the extent 

of resection detected in b, as in Fig. 1d. Mean (centre bars) and s.d. 
(error bars) from four independent experiments. *P < 0.05, two-tailed 
Welch's t-test. 
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Extended Data Fig. 5 | CST interacts with shieldin. a, Immunopre- 
cipitation of individual mouse CST subunits or the three subunit complex 
(each subunit bearing a Myc tag) with Flag-tagged mouse SHLD1, co- 
expressed in 293T cells. Flag-tagged POT1B and POTIA serve as positive 
and negative controls for CST binding, respectively. Representative of two 
experiments. b, Two-hybrid analysis of CST-shieldin interaction. Yeast 
cultures were grown overnight in synthetic complete medium that lacked 
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1 mM 3AT 8 mM 3AT 

tryptophan and leucine, to a density of 5 x 10’ cells per millilitre. Serial 
tenfold dilutions were generated and 4 l of each dilution was spotted 

on synthetic complete medium that lacked the nutrients tryptophan, 
leucine, adenine and histidine, and contained 3-aminotriazole (3-AT), as 
indicated. Plates were then incubated for 5 days at 30°C before imaging. 
Representative of three experiments. 
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Extended Data Fig. 6 | Localization of CST and Pola to DSBs. 

a, Quantification of HA-STN1 localization to DSBs induced by FOKI, as 
in Fig. 3e. Mean (centre bars) and s.d. (error bars) from 4-6 independent 
experiments, with >80 induced nuclei for each condition in each 
experiment. b, Immunofluorescence for endogenous Pola in FOKI-Lacl 
U20S cells in S phase and after RO3306 treatment (G2). Dotted lines 
denote the outline of the nucleus. Representative of two experiments. 


c, Examples of HA-STN1 and Pola localization at DSBs induced by FOKI 
in G2-arrested FOKI-LacI U20S cells (as in Fig. 3f). Representative of 
three experiments. d, Quantification of co-localization of Pola with DSBs 
induced by FOKI (as in Fig. 3f). Mean (centre bars) and s.d. (error bars) 
from three independent experiments, with >80 induced nuclei for each 
condition in each experiment. **P < 0.01, ***P < 0.001, ****P < 0.0001, 
two-tailed Welch’s t-test. 
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Extended Data Fig. 7 | Effect of STN1 knockdown on the intensity of 
RPA foci induced by ionizing radiation. Quantification of Myc-RPA32 
intensity per nucleus in the experiments shown in Fig. 3g, h. Medians 
(centre bars and numbers below) obtained from four independent 
experiments, with >20 nuclei for each experimental condition in each 
experiment. Each symbol represents one nucleus. *P < 0.05, 

2% D < ().0001, two-tailed Welch's t-test. 
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Extended Data Fig. 8 | Effect of CST and Pola on PARPi treatment of 
BRCA1-deficient cells. a-f, Immunoblots on the MEFs used in Fig. 4a—e 
to verify the absence of deleted proteins and efficacy of the shRNAs. 
Reduction in STN1 expression is used as a proxy for the efficacy of the 
Ctc1 shRNA because no antibody to mouse CTC] is available. Each 
immunoblot is representative of three experiments. g, Immunoblots 

for BRCA1 and STN1 in the cells used in Fig. 4f. Representative of 

two experiments. h-j, Control experiment to assess that cells analysed 


in Fig. 4f progressed through S phase during treatment with PARPi. 

h, Experimental timeline, as in Fig. 4f, but with inclusion of BrdU in 

the medium during treatment with PARPi. i, Example of the assay for 

the presence of BrdU (immunofluorescence) in metaphases collected 
after the experimental timeline, as in h. j, Quantification of the BrdU 
incorporation into metaphase chromosomes, as in i (one experiment with 
ten metaphases per condition). 
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53BP1 is a chromatin-binding protein that regulates the repair of 
DNA double-strand breaks by suppressing the nucleolytic resection 
of DNA termini}. This function of 53BP1 requires interactions with 
PTIP? and RIF1*~, the latter of which recruits REV7 (also known 
as MAD212) to break sites!®!!. How 53BP1-pathway proteins 
shield DNA ends is currently unknown, but there are two models 
that provide the best potential explanation of their action. In one 
model the 53BP1 complex strengthens the nucleosomal barrier to 
end-resection nucleases!*"’, and in the other 53BP1 recruits effector 
proteins with end-protection activity. Here we identify a 53BP1 
effector complex, shieldin, that includes C20orf196 (also known as 
SHLD1), FAM35A (SHLD2), CTC-534A2.2 (SHLD3) and REV7. 
Shieldin localizes to double-strand-break sites in a 53BP1- and 
RIF1-dependent manner, and its SHLD2 subunit binds to single- 
stranded DNA via OB-fold domains that are analogous to those of 
RPA1 and POT1. Loss of shieldin impairs non-homologous end- 
joining, leads to defective immunoglobulin class switching and 
causes hyper-resection. Mutations in genes that encode shieldin 
subunits also cause resistance to poly(ADP-ribose) polymerase 
inhibition in BRCA1-deficient cells and tumours, owing to 
restoration of homologous recombination. Finally, we show that 
binding of single-stranded DNA by SHLD? is critical for shieldin 
function, consistent with a model in which shieldin protects DNA 
ends to mediate 53BP1-dependent DNA repair. 

To discover proteins acting in the 53BP1 pathway, we searched for 
genes whose mutation restores homologous recombination in BRCA1- 
deficient cells and leads to resistance to poly(ADP-ribose) polymer- 
ase (PARP) inhibition, which is a hallmark of 53BP1 deficiency!*"!®. 
We undertook three independent CRISPR-Cas9 screens that entailed 
the transduction of BRCA1-deficient cells with lentiviral libraries of 
single-guide RNAs (sgRNAs) (Extended Data Fig. 1a). The resulting 
pools of edited cells were exposed to near-lethal doses of two clinically 
used PARP inhibitors (PARPi), either olaparib or talazoparib!”. We 
screened both an engineered human RPE1-hTERT TP53-/~BRCA1‘~ 
cell line (hereafter, RPE] BRCA1*°) and SUM149PT cells carrying a 
hemizygous BRCA1 frameshift mutation. The gene-based results of 
the olaparib screens are found in Supplementary Table 1 and those of 
the talazoparib screen were published previously'®. 

The genes coding for 53BP1 and for the uncharacterized protein 
C20orf196 were hits in all three screens (Fig. 1a). We also identified 
SCAF1 and ATMIN, which encode an SR-family protein and a tran- 
scription factor, as hits in the two olaparib screens (Fig. 1a) whereas 
genes coding for proteins acting upstream (H2AX, MDC1, RNF8 


and RNF168) or downstream (RIF1) of 53BP1 were hits in the RPE1 
BRCA1*° screen (Supplementary Table 1). The presence of 53BP1 and 
53BP1-pathway proteins suggested that these screens could reveal hith- 
erto uncharacterized 53BP1 effectors. 

In competitive growth assays (Fig. 1b), sgRNAs targeting 53BP1 (also 
known as TP53BP1) led to the outgrowth of BRCA1*° cells in the pres- 
ence of olaparib (Fig. 1c; genotyping information in Supplementary 
Table 2). Similarly, sgRNAs targeting C20orf196, ATMIN and SCAF1 led 
to resistance to PARPi (Fig. 1c and Extended Data Fig. 1b). In parallel 
studies, transfection of CRISPR RNAs (crRNAs) and trans-activating 
crRNAs (tracrRNAs) that target C200rf196, 53BP1 or PARP1 caused 
talazoparib resistance in SUM149PT cells (Fig. 1d and Extended Data 
Fig. 1c). Because C20orf196 was identified as a hit in all three screens 
and validated in independent assays, we sought to determine its role 
in DNA repair. 

C20orf196 is an uncharacterized protein comprising 205 amino 
acid residues (Fig. le), previously identified as a candidate REV7- 
interacting protein’’. We used immunoprecipitation coupled to mass 
spectrometry (IP-MS) to expand the interaction network surrounding 
the REV7-interacting proteins (Fig. 1f and Supplementary Table 3). 
One protein, FAM35A, was enriched in both C20orf196 and REV7 
immunoprecipitation samples (Fig. 1f). FAM35A was notable owing 
to the presence of three predicted OB-fold domains (OBA, OBB 
and OBC; Fig. le), reminiscent of those in the single-stranded DNA 
(ssDNA)-binding proteins RPA1”° and POT 17". IP-MS experiments 
with FAM35A recovered CTC-534A2.2, also identified in the REV7 
IP-MS (Fig. 1f and Supplementary Table 3). CTC-534A2.2 is a protein 
encoded by an alternative mRNA emanating from the TRAPPC13 locus 
(Fig. le and Extended Data Fig. 1d). ssRNAs against CTC-534A2.2 were 
not present in either of our first-generation sgRNA libraries. IP-MS 
with CTC-534A2.2 recovered C20orf196, FAM35A and REV7 (Fig. 1f 
and Supplementary Table 3), which suggests that these proteins form 
a single protein complex—this was confirmed by sequential affinity 
purification of epitope-tagged C20orf196, CTC-534A2.2, FAM35A and 
REV7 (Fig. 1g). 

FAM35A, C20orf196 and CTC-534A2.2 were identified in a fourth 
CRISPR-Cas9 screen using a second-generation sgRNA library, 
TKOvz2. This screen sought to identify genes that promote resistance 
to ionizing radiation in RPE] cells (Extended Data Fig. le). Seventy- 
five genes scored at a false discovery rate (FDR) < 1% and this gene 
set was found to be highly enriched in genes that encode non- 
homologous end-joining (NHEJ) factors by Gene Ontology enrich- 
ment (P=1.11 x 1071!, Fisher’s exact test with multiple correction; 
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Fig. 1 | Identification of shieldin. a, Venn diagram of the top 20 hits 

in each screen. b, Schematic of the competitive growth assays. 

c, Competitive growth assays + olaparib (16 nM) in RPE1 BRCA1*° cells. 
Data represent mean fraction of GFP-positive cells + s.e.m., normalized 
to day 0 (n= 3, independent transductions). d, Resistance to PARPi 
caused by mutation of C20orf196. SUM149PT cells transfected with 
indicated crRNAs were treated with 50 nM talazoparib for 14 d. Relative 
growth was normalized to a non-targeting crRNA (Ctrl). Bars represent 
mean + s.d. of multiple crRNAs per gene (shown is a representative plot 
of 2 biologically independent experiments). e, Domain architecture of 
shieldin subunits. SHLD3 contains a REV7-binding PXXXPP motif”®. 

f, Protein interaction network surrounding REV7, C20orf196, FAM35A 
and CTC-534A2.2. Solid and dashed arrows represent interactions at an 
FDR of < 1% and < 5%, respectively. g, Sequential affinity purifications 
from 293T cell lysates expressing the indicated proteins. Bound proteins 
were immunoblotted with the indicated antibodies (n =2 independent 
experiments). WCE, whole-cell extract. 


Fig. 2a and Supplementary Table 4). RIF1, FAM35A, C20orf196, CTC- 
534A2.2, 53BP1 and REV7 were all hits at FDR < 1% (Fig. 2a). These 
data suggest that the complex formed by C20orf196, FAM35A, REV7 


118 | NATURE | VOL 560 | 2 AUGUST 2018 


a Gene (normZ <0 only) b RPE1-hTERT TP53~- Cas9 


WT ~@ BRCA1KO 
-48-WT + olaparib -®-BRCA1*° + olaparib 


oo ore 6 1.07 sgSHLD2-1 sgSHLD3-1 
QD = 
Se s ° 3 
a je REV, g08 4 
3 pcLret¢ § 7 
x : *FAM35A MTBP 8 i 0.6 a 
= AIM PNKP G / 
3 CTC-534A2,2 (SHLD2) B04 
5 (SHLD3) NHEJ1 8 
£02 
. {e} 
LIG4 < 0 
0 5 101520 0 5 10 15 20 
Time (d) Time (d) 
. papa otapario = pees wr per BRCATKO PEI BRCAKO 
Untreated (16 nM) (16 nM) a is 
, aan 100: = 80 40 
WT & 3 ° 
: gy = 60 b 30 * 30 ° 
2 60 | g 
5 lb e 
SHLD1KO o od’, 40 20 20 le 
2 40: 6 £ 
oO be NA na 
3 = 10. $ 
2 20. = 20 B 10 
ae Ss . 
SHLD2K° to) 2 e 
0. D 
(clone 2.9) SP ay | sae 6.0 ty) bio 
( eae ko: QQ8 IRD + eer egs e444 
RPE1 BRCA1K° 558 KO: & 5 5 5 s e 
ns a ~~ ~ 
RPE1 BRCA1K° Bs 8A 9 
a a 
RPE1 BRCA1KO f End of treatment 5 5 
5 @100 TW = --- sgEmpty untreated 
§ | 
= e = 4, — sgEmpty + olaparib 2 
53 4 g 807 3h Shid1 untreated | | © 
7] s g it --- SG untreatex 5 
fh 3 ® go4 | i — sgShld1 + olaparib 8 
8 = F 5 ! 4 ~~. sgShld2 untreated | 70 
o oO 4 1 W 
58 g 407 11 _— sgShid2 + olaparib |S 
Og £ i S 
1} 3 $ 3204 | 8 
a E Me ~ 
5 He i 
o poe eee OS T T + T T 1 
Ko: 5 se98 0 20 40 60 80 100 120 140 
Ber 5 3 Time after start of treatment (d) 


Fig. 2 | Shieldin loss promotes PARPi resistance in cell and tumour 
models of BRCA1-deficiency. a, CRISPR dropout screen results in 

RPE] wild-type cells exposed to ionizing radiation. Gene-level normZ 
scores < 0 are shown. b, Competitive growth assays using olaparib (16 nM) 
in RPE1 BRCA1*® cells. Data are presented as mean +s.d., normalized 

to day 0 (n = 3, independent transductions). c, Clonogenic survival in 
response to 16 nM olaparib. Representative images are shown (left) and 
quantified (right). Bars represent mean + s.d. (n= 9: RPE1 wild type and 
BRCAI°SHLD1™; n= 3: BRCAIK°SHLD2™, n = 4: BRCA1K°53BP1™; 
biologically independent experiments). d, Quantification of cells with > 5 
RADS1 foci + 10 Gy ionizing radiation (6 h recovery). Biologically 
independent experiments are shown and the bars represent the mean + s.d. 
From left to right, the number of replicates was n = 3 (for both conditions 
in left panel); n = 3, 4, 3, 4, 3 and 3 (for conditions in middle panel, left 

to right); and n =4, 6, 6, 6, 6, 6, 6 and 6 (for conditions in right panel, left 
to right). e, Assessment of gene conversion by traffic light reporter assay. 
Biologically independent experiments are shown and the bars represent 
the mean +s.d. (n=3 for wild type and 53BP1®; n=4 for SHLD1®°, 
SHLD2®° and REV7*®). f, Kaplan-Meier curve showing tumour-specific 
survival of mice transplanted with KB1P4 tumour organoids + olaparib 
treatment for 80 d (n = 8 per treatment; editing efficiencies found in 
Supplementary Table 2). P values were calculated using a log-rank test 
(Mantel-Cox). IR, ionizing radiation; KO, knock out; WT, wild type. 


and CTC-534A2.2 promotes the repair of double-strand breaks (DSBs) 
by NHEJ. For reasons that will become apparent, we named this com- 
plex shieldin and renamed C20o0rf196, FAM35A and CTC-534A2.2 as 
SHLD1, SHLD2 and SHLD3, respectively. 

Independent sgRNAs targeting SHLD2 or SHLD3 caused sensitivity 
to the clastogen etoposide in competitive growth assays (Extended 
Data Fig. 1f) and caused resistance to olaparib in RPE] BRCA1*° cells, 
consistent with SHLD2 and SHLD3 acting with REV7 and SHLD1 
(Fig. 2b and Extended Data Fig. 1g). Clonal knockouts of SHLD1 or 
SHLD2 led to olaparib resistance in BRCA1*® cells, and SHLD2*° 
resulted in a phenotype that approached that of 53BP1 loss (Fig. 2c). 
Similar results were obtained with 11 independent clonal knockouts 
of SHLD1 in SUM149PT cells exposed to talazoparib (Extended Data 
Fig. 1h). Furthermore, expressing GFP-SHLD2 in double knockout 
BRCA1*°SHLD2*° cells restored olaparib sensitivity (Extended Data 
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counting >100 cells. Data are mean + s.d. d, Colocalization of GFP-tagged 
shieldin subunits with mCherry foci in U2OS-FokI cells upon mCherry- 
LacR-Fokl expression. e, Quantification of GFP-SHLD3 or endogenous 
REV7 focus intensity. Each point represents a cell. Lines represent the 
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112, 54 and 116, 49 and 111, and 117 (for SHLD3 siRNA, only REV7 foci 
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Fig. 1i). Resistance to PARPi in shieldin-deficient BRCA1*® cells is 
due to restoration of homologous recombination, as measured both by 
RAD51 focus formation induced by ionizing radiation and by a reporter 
for gene conversion (the traffic light reporter assay””) (Fig. 2d, e 
and Extended Data Fig. 2a-d). 

Next, we tested whether loss of shieldin causes resistance to PARPi in 
the KB1P mouse mammary tumour model that is deficient in Brcal and 
Trp53, which encodes p53 (ref. 7*). sgRNAs targeting Shld1 and Shld2 
led to resistance to PARPi in KB1P-G3 mammary tumour cells and in 
Brcal~/~Trp53 /~ mouse embryonic stem cells (Extended Data Fig. 3a, c). 
This resistance was also associated with restoration of homologous 
recombination (Extended Data Fig. 3b). Furthermore, transduction 
of Shld1- and Shld2-targeting sgRNAs suppressed the cell lethality asso- 
ciated with Brca1 loss in p53-proficient mouse embryonic stem cells 
(Extended Data Fig. 3d). We transduced the same sgRNAs into KB1P4 
mammary tumour organoids (Supplementary Table 2) and implanted 
them into the fat pads of mice. Olaparib treatment was initiated when 
tumours reached sizes of 50-100 mm, and was continued for 80 days. 
Although all untreated mice succumbed to excessive tumour burden 
within 20 days, the control group responded to olaparib for the dura- 
tion of the treatment (Fig. 2f). However, mice implanted with Shid1- 
and Shid2-mutated tumours exhibited a partial response to olaparib, 
with mice succumbing by day 60 (Fig. 2f). We conclude that shieldin 
loss causes resistance to PARPi in both human and mouse BRCA1- 
deficient tumour cells by reactivating homologous recombination. 

As expected of a complex with a direct role in DSB repair, shieldin 
accumulates at DSB sites in a 53BP1- and RIF1-dependent manner 
(Fig. 3a-f and Extended Data Fig. 4a—d). Loss of shieldin components 
did not impair formation of 53BP1 or RIF1 foci induced by ionizing 
radiation, indicating that shieldin acts downstream of 53BP1-RIF1 
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(Extended Data Fig. 4e, f). Consistent with this possibility, we observed 
genetic epistasis between 53BP1 and the shieldin genes using the 
RAD51 focus-formation assay in RPE] BRCAI®° cells (Extended 
Data Fig. 5a). We also observed that SHLD1 and 53BP1 were epistatic 
in terms of modulating talazoparib resistance in SUM149PT cells 
(Extended Data Fig. 5b). 

Analyses of the dependencies within the shieldin complex indicate 
that SHLD3 is the most apical component followed by REV7, SHLD2 
and then SHLD 1 (Fig. 3c-e, Extended Data Figs. 4a-d, 6, 7c—-e and 
Supplementary Note 1 for mapping details). Indeed, SHLD3 interacts 
with RIF1, which suggests that SHLD3 recruits shieldin to chromatin- 
bound 53BP1-RIF1 (Fig. 3g and Extended Data Fig. 7f). Further 
mapping studies suggest that shieldin consists of a DSB-recruitment 
module composed of SHLD3-REV7 that binds to the N terminus of 
SHLD2 (residues 1-50; Extended Data Figs. 6, 7a—c), and a presumptive 
DNA-binding module (SHLD2-SHLD1) that features the OB-fold 
domains at the SHLD2 C terminus (hereafter SHLD2-C, residues 
421-904; Extended Data Fig. 7a). 

To assess the role of shieldin in NHE], we first analysed class switch 
recombination in CH12F3-2 cells**. Mutation of each of the shieldin 
subunits compromised class switch recombination, with Shid1-edited 
cells having a reproducibly milder phenotype (Fig. 3h and Extended 
Data Fig. 8a-c). Shld2*° was epistatic with both 53bp1*° and Shld1*° 
mutations, consistent with them acting in the same genetic pathway 
(Extended Data Fig. 8b, c). The expression of AID—which initiates 
class switch recombination—was not altered in shieldin mutants, 
consistent with NHEJ deficiency (Extended Data Fig. 8d). Supporting 
this possibility, SHLD1 and SHLD2 mutations impaired random plasmid 
integration—which occurs largely through NHEJ”°—to an extent 
similar to that of 53BP1-deficient cells (Extended Data Fig. 8e, f). 
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Fig. 4 | Shieldin is an effector of 53BP1 by binding ssDNA. 

a, Schematic for artificially targeting shieldin to DSB sites. b, Formation 
of RADS51 foci induced by ionizing radiation 3 h after 10 Gy irradiation in 
BRCA1*°53BP1*° cells expressing the indicated fusion proteins. Data are 
mean + s.d. From top to bottom, the number of biologically independent 
experiments was n= 20, 22, 12, 12, 16, 4, 4, 4, 4, 4, 6, 6 and 3. c, EMSA of 
the SHLD2-C-SHLD1 complex isolated from 293T cells (see Extended 
Data Fig. 9b) incubated with radio-labelled ssDNA + unlabelled 
oligonucleotides (n = 2 independent experiments). EV, empty vector. 

d, EMSA of SHLD2-C wild type and variants (n = 4 independent 
experiments). e, Determination of SHLD2-C-SHLD1 ssDNA binding 
dissociation constant (Kg). Data are mean +s.d. (n =3 independent 
experiments). Representative EMSA shown in Extended Data Fig. 9c. 

f, Model of shieldin function. We speculate that the SHLD2 OB-fold 
domains bind to ssDNA at DSB sites to suppress resection and favour 
NHEJ. 


The loss of each shieldin subunit led to ionizing-radiation-induced 
RPA32 Ser4/Ser8 phosphorylation, which is a surrogate marker of 
end-resection”’, suggesting that shieldin protects DNA ends (Fig. 3iand 
Extended Data Fig. 8g). Supporting this hypothesis, the restoration of 
homologous recombination in shieldin-defective KB1P-G3 cells was 
dependent on ATM activity (Extended Data Fig. 3a, b), which promotes 
DNA end-resection in the absence of 53BP1'° or REV7’®. An accompany- 
ing paper”® shows that Shld2-mutated cells have increased end-resection 
at dysfunctional telomeres. Shieldin therefore antagonizes end-resection. 

We surmised that if shieldin is a downstream effector of 53BP1, 
artificially targeting shieldin to DSB sites should rescue phenotypes 
associated with 53BP1 loss. To do this, we fused SHLD2 to the RNF8 
forkhead-associated (FHA) domain, which is recruited to damaged 
chromatin independently of 53BP1 (Fig. 4a). We found that the FHA- 
dependent targeting of SHLD2 to DSB sites suppressed RAD51 focus 
formation induced by ionizing radiation in BRCA1*°53BP1*° cells 
(Fig. 4b and Extended Data Fig. 9a). These results suggest that SHLD2 
mediates 53BP1-dependent DNA repair. 

We observed that the FHA-SHLD2-C protein, which contains the 
OB-fold domains, potently suppressed RAD51 recruitment to DSB 
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sites in BRCA1*°53BP1*° cells (Fig. 4b and Extended Data Fig. 9a). 
This result suggested that DNA binding might underpin the effector 
function of the SHLD2 C terminus. To test for DNA binding activity, 
we affinity-purified SHLD2-C in the presence or absence of SHLD1 
(Extended Data Fig. 9b). We observed SHLD2-C binding to a 
radio-labelled ssDNA probe by electrophoretic mobility shift assays 
(EMSA), and competition with unlabelled oligonucleotides revealed 
that SHLD2-C preferentially binds to ssDNA over double-stranded 
DNA (Fig. 4c). Although SHLD1 is not essential for the DNA- 
binding activity of SHLD2-C, its presence increased the amount of 
SHLD2-C purified, and the retarded complex displayed a difference 
in mobility consistent with the SHLD2-C-SHLD1 complex binding 
to ssDNA (Fig. 4d, lane 2 versus 5). We estimate the binding affinity 
of the SHLD2-C-ssDNA interaction to be about 10 nM (Fig. 4e 
and Extended Data Fig. 9c). We conclude that SHLD2 possesses 
ssDNA-binding activity. 

To explore whether ssDNA binding is involved in shieldin func- 
tion, we generated four mutant versions (named m1-m4) of the 
SHLD2 OB folds by modelling the SHLD2 C terminus on an RPA1 
structure*” (RCSB Protein Data Bank code: 4GNX; Extended Data 
Fig. 9d). We also employed a splice variant of SHLD2 that disrupts 
OBB, which we refer to as SHLD2(S). We found that the SHLD2 m1 
and SHLD2(S) mutants—either in the context of full-length SHLD2 
or SHLD2-C proteins—were unable to suppress RAD51 focus forma- 
tion in BRCA1*°53BP1*° cells when fused to the RNF8 FHA domain 
(Fig. 4b and Extended Data Fig. 9a). Expression of full-length SHLD2 
m1 and SHLD2(S) in BRCAI*°SHLD2*° cells also failed to suppress 
RAD51 focus formation induced by ionizing radiation, unlike wild- 
type SHLD2 (Extended Data Fig. 10a, b). Importantly, both mutants 
localized to DSB sites (Extended Data Fig. 10c, d) and interacted with 
the other members of the shieldin complex (Extended Data Fig. 10e). 
Therefore, the SHLD2 m1 and SHLD2(S) mutants are defective in 
suppressing homologous recombination. 

Notably, the SHLD2-C m1 mutant was completely defective in 
ssDNA binding (Fig. 4d, lane 3) whereas the SHLD2(S)-C mutant 
displayed reduced and aberrant ssDNA-binding behaviour (Fig. 4d, 
lane 4). Because the m1 mutation produces a protein that is defective 
both in ssDNA-binding and suppression of homologous recombi- 
nation, but which is proficient in both complex assembly and DSB 
recruitment, we conclude that ssDNA binding by shieldin is critical 
for 53BP1-dependent DSB repair. 

In conclusion, the identification of shieldin forces us to re-evaluate 
how DNA end stability is regulated in vertebrates. Our results are 
consistent with a model in which shieldin is the ultimate mediator 
of 53BP1-dependent DNA repair by binding ssDNA and occluding 
access to resection nucleases (Fig. 4f). Our discovery of shieldin also 
has implications for the management of BRCA 1-mutated malignancies, 
as alterations in shieldin-coding genes may cause clinical resistance 
to PARPi. 
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METHODS 

Plasmids. DNA corresponding to sgRNAs was cloned into pX330 (Addgene: 
42230, Cambridge), LentiGuide-Puro (Addgene: 52963), LentiCRISPRv2 
(Addgene: 52961), or a modified form in which Cas9 was replaced by NLS-tagged 
GFP or mCherry using Agel and BamHI (designated as LentiGuide-NLS-GFP 
or -mCherry), as described”. Sequences of the sgRNAs used in this study are 
included in Supplementary Table 5. Coding sequences of C20orf196 and the short 
isoform of FAM35A were obtained from the ORFeome collection (http://horfdb. 
dfci.harvard.edu/), archived in the Lunenfeld-Tanenbaum Research Institute’s 
OpenFreezer*!. The complete coding sequence of the long isoform of FAM35A 
was generated by combining a synthesized fragment (GeneArt, Regensburg) cor- 
responding to the long isoform C terminus using an internal KpnI restriction site. 
The coding sequence of CTC534.A2.2 was generated by gene synthesis (GeneArt). 
The coding sequences were PCR amplified using AscI and Apal flanking primers 
and cloned into pcDNA5-FRT/TO-eGFP and pcDNA5-FRT/TO-Flag to obtain 
N-terminally tagged FAM35A, C20orf196 and CTC534A2.2. pGLUE-HA-Strep- 
FAM35A was generated by PCR amplification of the long isoform of FAM35A and 
cloning into pGLUE (Addgene: 15100) using AsclI and NotI. To generate FAM35A 
fragments and mutants, standard protocols for primer-directed mutagenesis or 
self-ligation of truncated PCR-products were used. To generate pcDNA5-FRT/ 
TO-V5-CTC534A2.2, eGFP was replaced by a V5 tag in the cloning vector 
pcDNA5-FRT/TO-eGFP using KpnI and Ascl restriction enzymes after which 
the coding sequence for CTC534A2.2 was PCR-amplified and inserted into 
pcDNA5-FRT/TO-V5-MCS using AsclI and Xhol restriction enzymes. 

To generate RNF8-FHA fusions, the N terminus of RNF8 (amino acids 1-160) 
was PCR amplified from pcDNA3-RNF8-FHA(1-160)-RNF168 with flanking Ascl 
sites and inserted into pcCDNA5-FRT/TO-eGFP-FAM35<A. eGFP (FHA)fusions of 
FAM35A were introduced into pCW57.1 (Addgene: 41393) by Gateway cloning using 
the pDONR221 donor vector. FAM35A amino acid substitution mutations and dele- 
tions were introduced by site directed mutagenesis and deletion PCR, respectively. 

The REV7 coding sequence was obtained from the ORFeome collection and 
was cloned into the pDEST-FRT/TO-eGFP vector using Gateway cloning and into 
the pcDNA5-FRT/TO-Flag vector by PCR amplification. The N-terminal 967 res- 
idues of RIF1 were amplified by PCR and cloned into the pDONR221 vector using 
Gateway technology. The fragment was then integrated into the pDEST-mCherry- 
LacR vector by Gateway cloning. Plasmids for the traffic light reporter system 
were obtained from Addgene (pCVL-TrafficLightReporter-Efla-Puro lentivirus: 
#31482; pCVL-SFFV-d14GFP-Efla-HA-NLS-Sce(opt)-T2A-TagBFP: #32627). 
Cell lines and gene editing. 293T and RPE1-hTERT cell lines were obtained from 
ATCC (Manassas), 293 Flp-In cells were obtained from Invitrogen (Carlsbad) 
and SUM149PT* cells were obtained from Asterand Bioscience (Detroit). U2OS 
ER-mCherry-LacIFokI-DD cells (U2OS-265, referred to in the text as U2OS-Fokl) 
were a kind gift of R. Greenberg (University of Pennsylvania). All cell lines are 
routinely authenticated by STR-analysis and tested negative for mycoplasma. 293T, 
293 Flp-In, U2OS and RPEI cells were cultured in high glucose- and GlutaMAX- 
supplemented DMEM (Gibco, Thermo Fisher Scientific, Waltham) + 1% peni- 
cillin-streptomycin (Thermo Fisher Scientific) and 10% heat inactivated fetal calf 
serum (Wisent, St-Bruno) at 37°C, 5% CO2. SUM149PT cells were cultured in 
Ham's F12 medium (Gibco) supplemented with 5% FCS, 10 mM HEPES, 1 j1g/ml 
hydrocortisone (Sigma-Aldrich, St. Louis), and 5 j1g/ml insulin (Sigma-Aldrich) 
at 37°C, 5% CO». Except for RPE1 clonogenic survival assays, which were per- 
formed at 3% On, cells were kept under normoxia conditions. Transient trans- 
fections of DNA and siRNA were performed using Lipofectamine 2000 (Thermo 
Fisher Scientific), PEI (Sigma-Aldrich) or calcium phosphate and Lipofectamine 
RNAiMAX, respectively (Thermo Fisher Scientific). siRNA efficiency was analysed 
by qPCR and immunoblotting. Stable integration of Flag~C20orf196/FAM35A/ 
REV7 with the Flp-In system was achieved by co-transfection of the pcDNA5-FRT/ 
TO plasmid with the recombinase vector pOG44 (Thermo Fisher Scientific) and 
hygromycin selection for integration. Lentiviral particles were produced in 293T 
cells by co-transfection of the targeting vector with vectors expressing VSV-G, 
RRE and REV using calcium phosphate or PEI (Sigma-Aldrich). Viral transduc- 
tions were performed in the presence of 4-8 j1g/tL polybrene (Sigma-Aldrich) 
at an MOI <1, unless stated otherwise. Transduced RPE] cells were selected by 
culturing in the presence of 15 1g/ml puromycin. For the BRCA1-deficient mouse 
cell experiments, all experiments were performed using virus produced with the 
LentiCRISPRv2 backbone (see Supplementary Table 5) and cells were infected 
using polybrene (8 j.g/ml). The medium was refreshed after 12 h and transduced 
cells were selected with puromycin. 

The generation of RPE1-hTERT TP53~/~BRCA1®° Cas9 cells has been 
described elsewhere**. REV7, 53BP1, FAM35A and C20orf196 gene knockouts 
were generated by electroporation of LentiGuide or LentiCRISPRv2 vectors 
using a Lonza Amaxa II Nucleofector (Basel) (for sgRNA sequences used see 
Supplementary Table 5; REV7 sgRNA1, FAM35A sgRNA2 and C20orf196 sgRNA1 
were used for clonal knockout generation). Twenty-four hours after transfection, 


cells were selected for 24-48 h with 15 j.g/ml puromycin, followed by single clone 
isolation. Triple knockout cell lines of TP53, BRCA1 and 53BP1 were created by 
mutating BRCA1 from the TP53~/~53BP1*° double knockout cell line. Triple 
knockout cell lines of TP53, BRCA1 and REV7, FAM35A or C20orf196 were created 
by mutating REV7, FAM35A or C20orf196 in the TP53~!~BRCA1*® cells. Loss of 
protein(s) was verified by immunoblotting when antibodies were available. Gene 
mutations were further confirmed by PCR amplification and TIDE analysis** (for 
primers used for genomic PCR, see Supplementary Table 6, for genomic editing 
information, see Supplementary Table 2). 

To generate SUM149PT 53BP1, PARP1 or C20orf196 knockout populations of 
cells, SUM149PT-doxCAS$9 cells were treated with doxycycline for 24 h at 1 pg/ml 
before transfection with EditR crRNA (Dharmacon, Lafayette). Transfection 
of guides 53BP1_5_1, 53BP1_5_3, PARP1_5_2, PARP1_5_4, C20orf196_5-1, 
C20orf196_5-2, C20orf196_5-3 and C20orf196_5-5 (see Supplementary Table 5) 
was performed at a concentration of 20 nM (crRNA:tracrRNA) in the presence of 
doxycycline (1 jig/ml) using Lipofectamine RNAiMAX in 48-well plates (35,000 
cells per well). The following day cells were split 1:3, fed 24 h later with medium 
supplemented with 50 nM talazoparib (without doxycycline) and kept in batch 
culture or further split to generate single cell colonies. Drug-containing medium 
was replenished every 3-4 days until PARP inhibitor resistant pools or clones 
were established. Clones were subsequently picked, expanded and validated by 
genomic PCR and sequence analysis (for primers used, see Supplementary Table 6, 
for genomic editing information, see Supplementary Table 2). Four SUM149PT 
C20orf196*° clones with mutations were chosen for further experimentation: clone 
A (C20orf196 5-1-C1), clone B (C20orf196 5-1-C2), clone C (C20orf196 5-3-C5) 
and clone D (C20orf196 5-5-C4). 

To generate 53BP1*° double mutant clones, SUM149PT C20orf196 clones A 
and D were infected with a lentivirus expressing an sgRNA targeting TP53BP1 
or a non-targeting control sgRNA (for sequences, see Supplementary Table 5) in 
medium containing 1 j1g/ml doxycycline. Forty-eight hours after infection, puro- 
mycin (1 j»g/ml) was added to the medium. Selection was maintained for 3 d, 
until the uninfected control cells were killed. Pools of selection-resistant cells were 
seeded into 384-well plates for short term survival assays (see below) or subcloned 
to generate clonal lines. 

Mouse embryonic stem cells with a selectable conditional Brcal deletion 
(Rosa26CERT2/t: Brg 1S°o/A)3> were cultured on gelatin-coated plates in 60% 
buffalo red liver (BRL) cell-conditioned medium supplied with 10% fetal calf 
serum, 0.1 mM 3-mercaptoethanol (Merck, Kenilworth) and 10? U/ml ESGRO LIF 
(Millipore, Burlington) under normal oxygen conditions (21% O2, 5% COz, 37 °C). 

The KB1P-G3 2D cell line was previously established from a Brcal~/~p53-/~ 
mouse mammary tumour and cultured as described". In brief, cells were cultured 
in DMEM/F-12 medium (Life Technologies, Carlsbad) in the presence of 10% 
FCS, penicillin—streptomycin (Gibco), 5 j1g/ml insulin (Sigma-Aldrich), 5 ng/ml 
epidermal growth factor (Life Technologies) and 5 ng/ml cholera toxin (Gentaur, 
Kampenhout) under low oxygen conditions (3% O2, 5% CO; at 37°C). 

The KB1P4 3D tumour organoid line was previously established from a 
Brca1~'~p53~/~ mouse mammary tumour and cultured as described*°. Cells were 
seeded in Basement Membrane Extract Type 2 (BME, Trevigen, Gaithersburg) 
on 24-well suspension plates (Greiner Bio-One, Kremsmiinster) and cultured in 
AdDMEM/F12 supplemented with 10 mM HEPES (Sigma-Aldrich), GlutaMAX 
(Invitrogen), penicillin-streptomycin (Gibco), B27 (Gibco), 125 \1M N-acetyl- 
L-cysteine (Sigma-Aldrich), and 50 ng/ml murine epidermal growth factor 
(Invitrogen). 

CH12F3-2 mutant clones were edited either through transient transfection 

with pX330 plasmid constructs expressing sgRNAs against Trp53bp1 (sgRNA: 
Trp53bp1_e6_834, see Supplementary Table 5), Fam35a, and Ctc534a2.2 or by 
lentiviral lentiCRISPR v2 transduction for C20orf196. Double knockout cell lines 
of Fam35a and Trp53bp1 or C20orf196 were generated by transient transfection of a 
pX330 plasmid expressing an sgRNA against Trp53bp1 or by lentiviral transduction 
with lentiCRISPRv2 with an sgRNA targeting C20orf196. 
Antibodies, siRNAs and drugs. An overview of all the antibodies used in this study, 
including dilution factors, can be found in Supplementary Table 7. The following 
siRNAs from Dharmacon were used in this study (final siRNA concentration: 
10 or 20nM): 53BP1: siRNA #2 (D-003548-02-0020); RIFI: iGENOME RIF1 siRNA 
(D-027983-02-0050); REV7: siGENOME MAD2L2 siRNA (M-003272-03-0010); 
C20orf196: SMARTpool: siGENOME C20orf196 siRNA (M-018767-00-0005); 
FAM35A: SMARTpool: siGENOME FAM35A siRNA (M-013761-01-0005); 
CTC534A2.2 (custom order): siRNA#1: 5/-GGACAAAACUCAAUCAAAU-3’, 
siRNA#2: 5'-CAGUAGAUCUAUUGGAGUU-3’, siRNA#3: 5/-CUGGAAGACAU 
UUGGACAA-3’, siRNA#4: 5’/-GCAAGAUAGUUUAAAGGCA-3’ (used as a 
pool). 

The following drugs were used in the course of this study: olaparib 
(SelleckChem, Houston, or Astra Zeneca, Cambridge), talazoparib (SelleckChem), 
cisplatin (Sigma-Aldrich), the ATM-inhibitor KU60019 (Sigma-Aldrich), and 
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etoposide (Sigma-Aldrich). Concentration and duration of treatment are indicated 
in the corresponding figure legends. 

Olaparib resistance screens. Viral particles of the TKOv1 sgRNA library were pro- 
duced as previously described*”. This library contains 91,320 sgRNA sequences, 
with a modal number of six sgRNAs per gene. Cas9-expressing cells were infected 
with an MOI <0.3 and the coverage of sgRNA representation was maintained at 
>100x (SUM149PT) or >200x (RPE1) (per replicate, if applicable). Twenty- 
four hours after transduction, transduced cells were selected for 120 h with 
10 jxg/ml puromycin (RPE1) or 48 h with 3 j1g/ml puromycin, followed by 72 h 
with 0.5 ug/ml puromycin (SUM149PT). Three days after transduction, the trans- 
duced cells were split into three technical replicates. Cells were passaged once 
every three days until nine days after infection, at which time olaparib (16 nM for 
RPE1 TP53~'~BRCA1®®, 2 1M for SUM149PT) was added to the cells. Olaparib- 
containing medium was refreshed every 4 d. Cells were collected at 3, 9, 18 and 23 d 
post-infection (RPE1) or at 3, 9, 19 and 26 d post-infection (SUM149PT) for down- 
stream processing as described?’. In short, total genomic DNA was isolated from 
2 x 10’ (t3 sample) or 1 x 10’ (later time points) cells using the QLAamp DNA Blood 
Maxi Kit (Qiagen, Germantown). DNA was precipitated with ethanol and sodium 
chloride and reconstituted in EB buffer (10 mM Tris-HCl pH 7.5). sgRNA sequences 
were PCR-amplified using primers harbouring Illumina TruSeq adapters with i5 and 
i7 barcodes, and the resulting libraries were sequenced on a Illumina NextSeq500 
(San Diego) using parameters previously described?’. Analysis was performed using 
model-based analysis of genome-wide CRISPR-Cas9 knockout (MAGeCK) version 
0.5.3°8, in conjunction with Python v3.5.1 on a Mac OS X El Capitan operating 
system. Non-treated samples collected at day 9 after transduction were compared 
to treated samples collected at day 23 (RPE1) or day 26 (SUM149PT). The positive 
score for each gene was calculated by using the ‘run’ function with the following 
arguments: mageck run -1 /path/to/TKOv1_library/ -n 08-02-2017_141703-sam- 
ple-label test,CTRL -t 1 -c 0-fastq /path/to/fastq1 /path/to/fastq2. 

Ionizing radiation dropout screen and TKOv2 library. The TKOv2 lentiviral 
CRISPR library was used for whole-genome CRISPR knockout screening. To 
design TKOv2, all possible 20-mer sequences upstream of NGG PAM sites were 
collected where the SpCas9 double-strand break would occur within a coding 
exon (defined by hg19/Gencode v19 ‘appris_principal; ‘appris_candidate_longest; 
or ‘appris_candidate’ transcript). Guides with 40-75% GC content were retained 
and further filtered to exclude homopolymers of length > 4, SNPs (dbSNP138), 
and relevant restriction sites, including BsmI (GAATCG) and BsmBI (CGTCTC). 
Candidate gRNA + PAM sequences were mapped to hg19 and sequences with 
predicted off-target sites in exons or introns, or sequences with more than two 
predicted off-target sites (with up to two mismatches) in any location, were dis- 
carded. Remaining guides were scored using a ‘sequence score table’ as previously 
described“. Four guides per gene were selected, with a bias towards high sequence 
scores and maximal coverage across exons (that is, moderate-scoring guides target- 
ing different exons were preferred to high-scoring guides targeting the same exon). 
The final library contains 70,555 gRNA targeting 17,942 protein-coding genes, 
as well as 142 sequences targeting LacZ, luciferase, and eGFP. Oligonucleotide 
sequences were ordered from CustomArray (Bothell), PCR-amplified, and cloned 
into the pLCKO vector as previously described?”. 

RPE1-hTERT TP53~‘~ Cas9-expressing cells were transduced with the lentivi- 
ral TKOv2 library (see below) at a low MOI (~0.35) and puromycin-containing 
medium was added the next day to select for transductants. Selection was contin- 
ued until 72 h post-transduction, which was considered the initial time point, t0. To 
identify ionizing radiation sensitizers, the negative-selection screen was performed 
by subculturing at days 3 and 6 (t3 and t6), at which point the cultures were split 
into two populations. One was left untreated while the second was treated with 3 Gy 
of ionizing radiation using a Faxitron X-ray cabinet (Faxitron, Tucson) every 3 d 
after day 6. Cell pellets were frozen at day 18 for gDNA isolation. Screens were per- 
formed in technical duplicates and library coverage of > 375 cells per sgRNA was 
maintained at every step. gDNA from cell pellets was isolated using the QlAamp 
Blood Maxi Kit (Qiagen) and genome-integrated sgRNA sequences were amplified 
by PCR using the KAPA HiFi HotStart ReadyMix (Kapa Biosystems, Wilmington). 
i5 and i7 multiplexing barcodes were added in a second round of PCR and final 
gel-purified products were sequenced on Illumina NextSeq500 systems to deter- 
mine sgRNA representation in each sample. DrugZ" was used to identify gene 
knockouts which were depleted from ionizing radiation-treated t18 populations 
but not depleted from untreated cells. 

Two-colour competitive growth assay. Twenty thousand cells were infected at an 
MOI of ~1.2 to ensure 100% transduction efficiency with either virus particles of 
NLS-mCherry LacZ-sgRNA or NLS-GFP GOI-sgRNA. Ninety-six hours after 
transduction, mCherry- and GFP-expressing cells were mixed 1:1 (2,500 cells + 
2,500 cells) and plated with or without olaparib (16 nM) or etoposide (100 nM) 
in 12-well format. During the course of the experiment, cells were subcultured 
when near-confluency was reached. Olaparib- or etoposide-containing medium 
was replaced every three days. Cells were imaged for GFP- and mCherry signal 
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the day of initial plating (t=0) and on days 3, 6, 9, 12, 15 and 18 (olaparib), or, ina 
separate set of experiments, on days 5, 10, 15 and 20 (etoposide). Cells were imaged 
using the automatic InCell Analyzer (GE Healthcare Life Sciences, Marlborough) 
with a 4x objective. Segmentation and counting of the number of GFP-positive 
and mCherry-positive cells was performed using an Acapella script (PerkinElmer, 
Waltham). Efficiency of indel formation was analysed by performing PCR amplifi- 
cation of the region surrounding the sgRNA sequence and TIDE analysis on DNA 
isolated from GFP-expressing cells 9 d post-transduction. 

Mass spectrometry. Following 24 h of doxycycline-induction of stably integrated 
293 FLP-IN cells (expressing Flag, Flag~FAM35A, Flag~REV7, Flag—C20orf196, 
Flag-CTC-534A2.2), cell pellets from two 150-mm plates were lysed in 50 mM 
HEPES-KOH (pH8.0), 100 mM KCl, 2 mM EDTA, 0.1% NP-40, 10% glycerol and 
affinity-purified using Flag~-M2 magnetic beads (Sigma-Aldrich). Subsequently, 
digestion with trypsin (Worthington, Columbus) was performed on-beads. All 
immunoprecipitations were performed in biological replicates (three for CTC- 
534A2.2, five for C20orf196 and six for FAM35A and REV7). 

For liquid chromatography-tandem mass spectrometry analysis, peptides 
were reconstituted in 5% formic acid and loaded onto a 12-15-cm fused silica 
column with pulled tip packed with C18 reversed-phase material (Reprosil-Pur 
120 C18-AQ, 3 jum, Dr. Maisch, Ammerbuch-Entringen). Peptides were analysed 
using an LTQ-Orbitrap Velos (Thermo Scientific) or a 6600 Triple TOF (AB 
SCIEX, Framingham) coupled to an Eksigent NanoLC-Ultra HPLC system and 
a nano-electrospray ion source (Proxeon Biosystems, Thermo Fisher Scientific). 
Peptides were eluted from the column using a 90-100-min gradient of acetonitrile 
in 0.1% formic acid. Tandem mass spectrometry spectra were acquired in 
a data-dependent mode for the top 10 most abundant ions using collision- 
induced dissociation. After each run, the column was washed extensively to prevent 
carry-over. 

Mass spectrometry data extraction and interaction scoring was performed 
essentially as described previously’”. In short, raw mass spectrometry files were 
converted to mzXML and analysed using the iProphet pipeline*’, implemented 
within ProHits“*. The data were searched against the human and adenovirus 
complements of the Uniprot (forward and reverse) database (Version 2017_09; 
reviewed Swiss-Prot entries only), to which common epitope tags were added as 
well as common contaminants (common contaminants are from the Max Planck 
Institute, http://141.61.102.106:8080/share.cgi?ssid=0f2gfuB, and the Global 
Proteome Machine, http://www.thegpm.org/crap/index.html; 85,393 entries were 
searched). Mascot and Comet search engines were used with trypsin specificity 
(2 missed cleavages allowed) and deamidation (NQ) and methylation (M) as var- 
iable modifications. Charges +2, +3 and +4 were allowed with a parental mass 
tolerance of maximum 12 ppm and a fragment bin tolerance of 0.6 Da selected for 
Orbitrap instruments, while 35 ppm and 0.15 Da were allowed for the TripleTOF 
6600. For subsequent SAINT analysis (see below), only proteins with an iProphet 
protein probability > 0.95 were considered, corresponding to an estimated protein 
FDR of ~0.5%. 

Interactions were analysed with SAINTexpress (v3.6.1)**°. SAINT probability 
scores were computed independently for each replicate against eight biological rep- 
licate analyses of the negative control (Flag alone; controls were ‘compressed’ to six 
virtual controls to increase robustness as described*’) and the average probability 
(AvgP) of the best three out of three (CTC534A2.2), five out of five (C20orf196) 
or six (FAM35A, REV7) biological replicates for each bait was reported as the 
final SAINT score. Preys with an estimated FDR < 1% were considered true 
interactors (AvgP > 0.91). The entire dataset, including the peptide identifica- 
tion and complete SAINTexpress output was deposited as a complete submis- 
sion in ProteomeXchange through the partner MassIVE housed at the Center for 
Computational Mass Spectrometry at University of California, San Diego (UCSD; 
http://massive.ucsd.edu). Data are available at MassIVE (ftp://massive.ucsd.edu/ 
MSV000082207). Unique accession numbers are MSV000082207 and PXD009313, 
respectively. Data can also been viewed at the prohits website (prohits-web.lunen- 
feld.ca) under dataset 29: Durocher laboratory. Data in Fig. 1f is represented using 
Cytoscape, using analyses with an FDR < 1 or 5%. 

Immunoprecipitation. 293T cells (1 x 10”) were transfected with pcDNA5.1-FRT/ 
TO-Flag-c20orf196 (10 1g), -GFP-REV7 (2 1g), -V5-CTC534A2.2 (14 1g) and 
pGLUE-HA-Strep-FAM35A (14 1g) or empty vectors using a standard calcium 
phosphate or PEI protocol. After 48 h, cells were washed with PBS, scraped, and 
lysed in 1 ml of lysis buffer (50 mM Tris-HCl pH 8.0, 100 mM NaCl, 2 mM EDTA, 
0.5% NP-40, 10mM Nak, 10 mM MgCl, and 10 U/ml Benzonase (Sigma-Aldrich)) 
on ice for 30 min. Lysates were centrifuged at 15,000g for 5 min at 4°C, and super- 
natants were incubated with 100 11 of Streptavidin Sepharose High Performance 
beads (GE Healthcare) or Dynabeads M-280 Streptavidin magnetic beads 
(Invitrogen) for 1 h at 4°C. Beads were washed 5 times with lysis buffer and eluted 
with 10 mM D-biotin (Invitrogen) in lysis buffer for 2 h at 4°C. When applicable, 
the eluate was incubated with 20 jl of GFP-Trap_M resin (Chromotek, Planegg- 
Martinsried) for 1 h at 4°C, washed 5 times with lysis buffer and eluted by boiling 
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in sample buffer. Pull-downs and whole cell extracts were loaded onto SDS-PAGE 
gels, followed by immunoblotting and probing with indicated antibodies. For GFP- 
CTC534A2.2 immunoprecipitations, an identical GFP-Trap pulldown procedure 
as above was used. For V5-CTC534A2.2 immunoprecipitations, lysates from one 
confluent 10-cm dish of 293T cells transfected with 10 jug pcCDNA5.1-FRT/TO-V5- 
CTC534A2.2 vector was incubated with 10 ,.g/ml anti-V5 antibody (Invitrogen) for 
2 hat 4°C. Subsequently, 50 11 of protein G Dynabeads (Invitrogen) was added to 
the lysates and incubated for an additional 1 h at 4°C. Beads were washed 4 times 
with lysis buffer and boiled in 50 jl 2 x SDS buffer. 

Clonogenic survival assays. RPE1-hTERT TP53~/~ cells were seeded in 
10-cm dishes (wild type, 250 cells; BRCA1°53BP1®°, 500 cells; BRCA1®° or 
BRCA1*®°C200rf196®°, 1,500 cells; BRCA 1X°FAM35A°°, 750 cells) in the presence 
of 800 nM cisplatin or 16 nM olaparib or left untreated. Cisplatin dosing lasted 
24 h, after which cells were grown in drug-free medium. Olaparib-containing 
medium was refreshed after 7 d. After 14 d, colonies were stained with crystal 
violet solution (0.4% (w/v) crystal violet, 20% methanol) and manually counted. 
Relative survival was calculated for the drug treatments by setting the number of 
colonies in non-treated controls at 100%. 

For Rosa26CER?/"*Brcq 15/4 cells, Cre-mediated inactivation of the endog- 

enous mouse Brca1SCo allele was achieved by overnight incubation of cells with 
0.5 1M 4-hydroxytamoxifen (Sigma-Aldrich). Four days after switching, cells 
were seeded in triplicate at 10,000 cells per well in 6-well plates for clonogenic 
survival assays. For experiments with Rosa26C?ER!2/"'*Brca 1S’ 53-null cells, 
cells were plated in the presence of 15 nM olaparib. Cells were stained with 0.1% 
crystal violet one week later. Clonogenic survival assays with PARPi (olaparib) and 
ATMi (KU60019) combination treatment were performed as described previously 
with minor adjustments”. Five thousand KB1P-G3 cells were seeded per well into 
6-well plates on day 0, and then PARPi, ATMi, both or neither were added. The 
medium was refreshed every 3 d. On day 6, the ATMi alone and untreated groups 
were stopped and stained with 0.1% crystal violet, the other groups were stopped 
and stained on day 9. Plates were scanned with a GelCount (Oxford Optronics, 
Abingdon). Quantifications were performed by solubilizing the retained crystal 
violet using 10% acetic acid and measuring the absorbance at 562 nm using a Tecan 
plate reader (Tecan, Mannedorf). 
Short-term survival assays. Ten thousand RPE1-hTERT Cas9 TP53~/~ parental 
cells and additional mutants (BRCA 1° and/or FAM35A*°) with or without stable 
integration of indicated eGFP fusions by viral transduction were seeded in 12-well 
format with or without 200 nM olaparib (and 1 j1g/ml doxycycline if applicable). 
Medium with olaparib (and doxycycline) was replaced after 4 d, and cells were 
trypsinized and counted after seven days using an automated Z2 Coulter Counter 
analyser (Beckman Coulter, Indianapolis). 

SUM149PT cells were plated at 500 cells per well in 384-well plates and varying 
amounts of talazoparib in DMSO were added the following day using an Echo 550 
liquid handler (Labcyte, San Jose). After 5 d of growth, cell survival was assayed 
using CellTiter-Glo according to the manufacturer's protocol (Promega, Madison). 
Immunofluorescence. For 53BP1 immunofluorescence, cells were cultured on 
coverslips and treated with 5 or 10 Gy X-ray irradiation and fixed with 2-4% 
paraformaldehyde (PFA) 1 h after irradiation. Cells were permeabilized with 0.3% 
Triton X-100, followed by blocking in 10% goat serum, 0.5% saponin, 0.5% NP-40 
in PBS (blocking buffer A). Cells were co-stained using 53BP1 and yH2AX primary 
antibodies (see Supplementary Table 7) in blocking buffer A for 1.5 h at room 
temperature, followed by 4 washes in PBS, incubation with appropriate secondary 
antibodies in blocking buffer A plus 0.8 jg/mL DAPI for 1.5 h at room temperature, 
and finally four washes in PBS. 

For RAD51 immunofluorescence, cells with or without stable integration of 
eGFP-tagged proteins or sgRNAs via viral transduction were grown on glass 
coverslips and treated with 10 Gy X-ray irradiation and recovered for 3 to 6 h (as 
indicated). Cells were fixated and extracted using 1% PFA, 0.5% Triton X-100 in 
PBS for 20 min at room temperature, followed by a second extraction/fixation 
using 1% PFA, 0.3% Triton X-100, 0.5% methanol in PBS for 20 min at room tem- 
perature. Blocking, primary and secondary antibody incubations (1.5 h at room 
temperature followed by 4 PBS washes) were performed in BTG buffer (10 mg/mL 
BSA, 0.5% Triton X-100, 3% goat serum, 1 mM EDTA in PBS) or PBS* (0.5% BSA, 
0.15% glycine in PBS). 

For REV7 and RIF1 immunofluorescence, cells were grown on glass coverslips, 
treated with 5 or 10 Gy X-ray irradiation and fixed with 2-4% PFA 1-2 h after 
irradiation. Cells were permeabilized with 0.3% Triton X-100. For REV7 blocking, 
primary and secondary antibody incubations (1.5 h at room temperature followed 
by 4 washes in PBS) were performed in blocking buffer A. For RIF1 blocking, 
primary and secondary antibody incubations (1.5 h at room temperature followed 
by 4 washes) were performed in PBG buffer (0.2% cold water fish gelatin (Sigma 
Aldrich), 0.5% BSA in PBS). 

For GFP-shieldin focus and laser stripe analysis, U2OS or RPE cells were 
grown on glass coverslips and either transiently transfected with 1 jug vector 


expressing GFP-FAM35A or -CTC534A2.2, or virally transduced with GFP- 
FAM35A-expressing vector. Forty-eight hours after transfection, or 24 h after 
0.5 jxg/ml doxycyclin induction, cells were treated with 5 Gy X-ray irradiation or 
micro-irradiated, pre-extracted 10 min on ice with NuEx buffer (20 mM HEPES, 
pH 7.4, 20 mM NaCl, 5 mM MgCl2, 0.5% NP-40, 1 mM DTT and protease inhib- 
itors) followed by 10 min 2% PFA fixation 1 h post- ionizing radiation or micro- 
irradiation. Antibody staining and blocking were performed as described above 
except in PBS + 0.1% Tween-20 + 5% BSA using GFP and yH2AX antibodies. 

DAPI (0.8 g/ml) was included in all experiments to stain nuclear DNA. 
Coverslips were mounted using Prolong Gold mounting reagent (Invitrogen) or 
Aqua PolyMount (Polyscience, Warrington). Images were acquired using a Zeiss 
LSM780 laser-scanning microscope (Oberkochen), a Leica SP8 confocal micro- 
scope (Wetzlar) or a Zeiss Axiolmager D2 widefield fluorescence microscope. Foci 
were manually counted. 

RAD51 immunofluorescence in KB1P-G3 cells was performed as previously 
described, with minor modifications!®. Cells were grown on 8-well chamber slides 
(Millipore). Ionizing-radiation induced foci were induced by gamma-irradiation 
(10 Gy) 4h before sample preparation. Cells were then washed in PBS++ (2% 
BSA, 0.15% glycine, 0.1% Triton X-100) and fixed with 2% PFA/PBS+-+ for 20 
min on ice. Fixed cells were washed with PBS++ and were permeabilized for 
20 min in 0.2% Triton X-100/PBS+-+-. All subsequent steps were performed in 
PBS+-+-. Cells were washed thrice and blocked for 30 min at room temperature, 
incubated with the primary antibody for 2 h at room temperature, washed thrice 
and incubated with the secondary antibody for 1 h at room temperature. Lastly, 
cells were mounted and counterstained using Vectashield mounting medium with 
DAPI (H1500, Vector Laboratories, Burlingame). 

Scale bars indicated in the figure panels represent 10 j1m, unless stated 

otherwise. 
LacR-RIF1 N-terminus and FokI-induced focus formation. For monitoring 
recruitment of GFP-tagged shieldin subunits to mCherry-LacR-Rifl (1-967) foci, 
150,000 U2OS-FokI cells (known also as U2OS-DSB)** were seeded on 6-well 
plates containing glass coverslips without any induction of FokI. Twenty-four hours 
after seeding, cells were transfected using 11g of pDEST-mCherry-LacR or pDEST- 
mCherry-LacR-Rifl (1-967), if applicable, and 0.5-1 jg of GFP fusion expression 
vectors. Cells were fixed with 4% PFA 24-48 h after transfection. For monitoring 
the localization of the FAM35A N terminus to Rif (1-967) foci with siRNA knock- 
down of other shieldin subunits, an essentially identical protocol was used with 
the following adjustments: 350,000 U2OS-FokI cells were reverse-transfected 
with Lipofectamine RNAiMAX-siRNA (10 nM) complex. Twenty-four hours after 
siRNA transfection, the mCherry-LacR and GFP fusion plasmids were transfected. 
Cells were fixed with 4% PFA 48 h after DNA transfection. For monitoring recruit- 
ment of GFP-tagged shieldin subunits to DSBs at the LacO array, FokI stabiliza- 
tion and nuclear translocation was induced by treating cells with 0.1 \1M Shield1 
(Clontech, Mountain View) and 10 jg/ml hydroxytamoxifen for 4 h. 

ImageJ (https://imagej.nih.gov/ij/) was used to quantify foci in the U2OS-FokI 
system. An mCherry focus and DAPI nuclear signal were used to generate masks. 
The average GFP fluorescence or immunofluorescence intensity in the mCherry 
focus mask was divided by the corresponding average nuclear intensity, and the 
ratio is reported. Cells displaying a ratio of focus/nuclear average intensity >3 are 
defined as containing a focus. 

Microirradiation. For laser microirradiation, virally transduced RPE] cells 
expressing the indicated eGFP-tagged proteins were grown on glass coverslips and 
transfected with siRNAs. Forty-eight hours after transfection, protein expression 
was induced using 0.5 g/ml doxycycline, and 24 h later cells were presensitized 
with 1 g/ml Hoechst for 15 min at 37°C. DNA damage was introduced with a 
355-nm laser (Coherent, Santa Clara, 40mW) focused through a Plan-Apochromat 
40x oil objective to yield a spot size of 0.5-1 mm using a LSM780 confocal micro- 
scope (Zeiss) and the following laser settings: 100% power, | iteration, frame size 
128 x 128, line step 7, pixel dwell: 25.21 1s. 

Traffic light reporter assay. Cells were infected with pCVL.TrafficLightReporter. 
Efla.Puro lentivirus at a low MOI (0.3-0.5) and selected with puromycin (15 j1g/l). 
Cells (7 x 10°) were nucleofected with 5 jg of pCVL.SFFV.d14GFP.Efla. 
HA.NLS.Sce(opt).T2A.TagBFP plasmid DNA in 100 tl of electroporation buffer 
(25 mM NazHPO, pH 7.75, 2.5 mM KCl, 11 mM MgCl,), using program T23 ona 
Nucleofector 2b (Lonza). After 72 h, GFP and mCherry fluorescence was assessed in 
BEP-positive cells using a Fortessa X-20 (BD Biosciences, San Jose) flow cytometer. 
Phospho-RPA immunoblotting. For phospho-RPA staining, CH12 cells were left 
untreated, or were treated with 25 Gy of ionizing radiation using a Faxitron X-ray 
cabinet, and were then collected by centrifugation 3 h later. Pellets were lysed on ice 
for 10 min in high salt lysis buffer (50 mM Tris-HCl pH 7.6, 300 mM NaCl, 1 mM 
EDTA, 1% Triton X-100, 1 mM DTT, 1x EDTA-free protease inhibitor cocktail 
(Roche, Basel)), cleared by centrifugation at 20,000g for 10 min at 4°C, and quan- 
tified by bicinchoninic acid assay (BCA; Pierce, Thermo Fisher Scientific). Equal 
amounts of whole-cell extracts were separated by SDS-PAGE on 4-12% Bis-Tris 
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gradient gels (Invitrogen), transferred to nitrocellulose and immunoblotted for 
pRPA32 (S4/S8). 

Mouse mammary tumour models. All animal experiments were approved by 
the Animal Ethics Committee of The Netherlands Cancer Institute (Amsterdam) 
and performed in accordance with the Dutch Act on Animal Experimentation 
(November 2014). KB1P4 tumour organoids were transduced using spinoculation 
as previously described*?. NMRI-nude female mice were purchased from Janvier 
Laboratories and used for transplantation studies at the age of 6-9 weeks. A power 
analysis was performed to calculate that a minimum of 8 mice per group were 
needed to achieve a power of 0.8 (two-sided test, alpha = 0.05). Tumour organoids 
were allografted in mice as previously described with minor adjustments”. In brief, 
tumour organoids were collected, incubated with TripLE at 37°C for 5 min, disso- 
ciated into single cells, washed and embedded in a 1:1 mixture of tumour organoid 
culture medium and Basement Membrane Extract (Trevigen) in a cell concentra- 
tion of 10* cells per 40 11. Subsequently, 10° cells were injected in the fourth right 
mammary fat pad of NMRI nude mice. Mammary tumour size was determined by 
caliper measurements and tumour volume was calculated (0.5 x length x width’). 
Treatment of tumour-bearing mice was initiated when tumours reached a size of 
50-100 mm*. Mice were randomly allocated into the untreated (n= 8) or olaparib 
treatment group (n= 8). Olaparib was administered in a blinded fashion at 100 mg/kg 
intraperitoneally for 80 consecutive days. Animals were killed with CO, when 
the tumour reached a volume of 1,500 mm?. The tumour was collected, fixed in 
formalin for histology and several tumour pieces were collected for DNA analysis. 
Class switch recombination assays. To induce switching in CH12F3-2 murine B 
cell lymphoma cells, 2 x 10° cells were cultured in CH12 medium supplemented 
with a mixture of IL4 (10 ng/mL, R&D Systems #404-ML-050, Minneapolis), TGF3 
(1 ng/ml, R&D Systems #7666-MB-005) and anti-CD40 antibody (1 j.g/ml, #16- 
0401-86, eBioscience, Thermo Fisher) for 48 h. Cells were then stained with anti- 
IgA-PE and fluorescence signal was acquired on an LSR II or Fortessa X-20 flow 
cytometer (BD Biosciences). To probe AID levels in the stimulated cells, immuno- 
blotting was performed on total cell lysates using anti- AID and anti-(-actin anti- 
bodies (Supplementary Table 7). Band quantification was analysed by ImageJ. 
Plasmid integration assay. Two hundred thousand RPE1 cells were seeded into 
6-well plates and 24 h later transfected with 2 jg of BamHI/EcoRI-linearized peG- 
FP-cl using PEI. Seventy-two hours after transfection, cells were seeded for colony 
formation into 10-cm dishes in the presence (50,000 cells per dish) or absence 
(500 cells per dish) of 600 j:g/ml G418. At this point, transfection efficiency was 
analysed by measuring GFP-positivity using flow cytometry. Medium with G418 
was refreshed every 3 d. Fourteen days after seeding, colonies were stained with 
crystal violet solution and manually counted. NHEJ efficiency was calculated 
according to the following formula: 


Percentage of surviving colonies on selection 


Percentage of surviving colonies without selection 
x percentage of transfected cells 


The data shown for the different knockout clones in Extended Data Fig. 8e repre- 
sent NHEJ efficiency as calculated with the above formula, followed by normali- 
zation to wild-type cells (for which NHEJ efficiency is set to 100%). 
DNA binding assays. Shieldin proteins were isolated using the immunopre- 
cipitation protocol described above with the following modifications. 293T 
cells were transfected with pGLUE- FAM35A(421-904), the indicated mutants 
of this construct, or the empty pGLUE Strep/HA-tagging vector and pcD- 
NA5.1-FRT/TO-Flag-C20orf196 in a 2:1 ratio for a total of 10 jg per 10 cm 
dish. Complexes were immunoprecipitated as described, except using a reduced 
NP-40 detergent concentration (0.1%) for the last two washes and elution buffer. 
Eluted proteins were concentrated using Amicon Ultra 0.5 ml 10K centrifu- 
gal filter units (Millipore). Concentrations of isolated proteins were estimated 
by SDS-PAGE and Coomassie staining, followed by comparison to a standard 
curve of known bovine serum albumin (BSA) concentrations measured by 
fluorescence in the 700-nM channel of the Odyssey imager (LI-COR). A radio- 
labelled ssDNA probe was prepared by T4 polynucleotide kinase (New England 
Biolabs, Ipswich) phosphorylation of HPLC-purified 59-nt DNA oligonucle- 
otide (BioBasic, Markham); TACGTTAGTATGCGTTCTTCCTTCCAGAG 
GTTTTTTTTTTTTTTTTTTTTTTTTTTTTT) using [y-*?P]ATP 
(3,000 Ci/mmol, 10 mCi/ml; Perkin-Elmer, Woodbridge). Unlabelled com- 
petitors were prepared using the same oligonucleotide sequence alone or hybri 
dized to the complementary sequence (AAAAAAAAAAAAAAAAAAAAAAA 
AAAAAACCTCTGGAAGGAAGAACGCATACTAACGTA) by heating at 80°C 
for 10 min and gradual cooling to room temperature overnight. 

For electrophoretic mobility shift assays, 20 nM of labelled ssDNA probe was 
incubated with purified proteins for 20 min in the elution buffer with the addi- 
tion of 1 mM DTT and 1 mg/ml BSA at room temperature. Glycerol was then 
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added to a final concentration of 8.3% and resolved on 6% acrylamide-TAE gels. 
Gels were adhered onto blotting paper (VWR) and enclosed in plastic wrap. Gels 
were exposed to a storage phosphor screen (GE Healthcare) and visualized using 
a Typhoon FLA 9500 biomolecular imager (GE Healthcare). Dissociation con- 
stant (Kg) was determined in GraphPad Prism from nonlinear regression analysis 
assuming single-site specific binding of saturation titration experiments, defining 
all signal above the free probe band to be bound probe, as measured in ImageQuant 
TL(GE Healthcare). The fraction of probe bound is defined as: 


Signal of bound probe 


Signal of bound probe + signal of free probe 


and the concentration of unbound FAM35AC-C20orf196 (referred in the text as 
SHLD2-C-SHLD1) complex is calculated by multiplying the fraction of probe 
bound by the initial concentration of ssDNA probe, and subtracting this from the 
initial concentration of SHLD2-C-SHLD1, given the assumption of 1:1 binding. 
Statistical analysis. All data are represented as individual replicates and replicate number, 
mean and error bars are explained in the figure legends. The statistical tests we used 
(all of which were common tests) and resulting P values are indicated in the figure 
legends and/or figure panels and have been generated using GraphPad Prism software. 
Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. All source data represented in the graphs displayed in this 
article are available online (Supplementary Data 1-12). Uncropped western 
blots can be found online as Supplementary Fig. 1. Data of the CRISPR-Cas9 
screens are included as Supplementary Table 1 (PARPi positive selection screens) 
or Supplementary Table 4 (ionizing radiation sensitivity dropout screen). IP-MS 
data (Supplementary Table 3) are available at MassIVE (ftp://massive.ucsd. 
edu/MSV000082207, with unique accession numbers MSV000082207 and 
PXD009313). IP-MS data can also been viewed at the prohits website (http://pro- 
hits-web.lunenfeld.ca) under dataset 29: Durocher laboratory. 
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Extended Data Fig. 1 | See next page for caption. 
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Extended Data Fig. 1 | The identification of the shieldin complex and 
its role in the response to genotoxic treatments. a, Schematic of the 
PARPi resistance screens. b, Competitive growth assays determining the 
capacity of the indicated sgRNAs to cause resistance to PARP inhibitors 
in RPE1 BRCA1*® cells. Data are presented as the mean fraction of 
GFP-positive cells + s.e.m., normalized to day 0 (n =3, independent 

viral transductions). Gene-editing efficiencies of the sgRNAs can be 
found in Supplementary Table 2. Note that we have not been able to 
obtain TIDE data for the ATMIN-targeting sgRNAs. c, Representative 
images of SUM149PT-Cas9 cells transfected with indicated crRNAs 

(see Methods) and exposed to 50 nM talazoparib for 14 d. Purple 

colour indicates cells detected by Incucyte live-cell imaging. Scale bar, 
100 um. The data are a representative set of images from two biologically 
independent experiments. d, Screenshot of the genomic locus surrounding 
human CTC-534A2.2 taken from ENSEMBL. e, Schematic of the screen 
performed in RPE1-hTERT TP53~‘~ cells stably expressing Cas9 to study 
genes mediating ionizing radiation-sensitivity. f, g, Competitive growth 
assays measuring the capacity of the indicated sgRNAs to cause resistance 
to etoposide (100 nM) in RPE1 wild-type cells (f) or PARPi (16 nM) 
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in RPE1 BRCA1*° cells (g). Data are presented as the mean fraction of 
GFP-positive cells + s.d., normalized to day 0 (n =3, independent viral 
transductions). Gene-editing efficiencies of the sgRNAs can be found 

in Supplementary Table 2. h, Talazoparib sensitivity in 11 SHLD1X° 
SUM149PT clones obtained after co-transfection of tracrRNA and one of 
four distinct SHLD1 crRNAs (5-1, 5-2, 5-3 or 5-5). Each clone was exposed 
to talazoparib in a 384-well plate format for 5 days. As a comparison, 
talazoparib sensitivity in parental SUM149PT cells with wild-type 
SHLD1 (WT) is shown, as is talazoparib resistance in a BRCA1 revertant 
subclone (BRCA1-rev) of SUM149PT™. Bars represent the mean +s.d. 
(n=4 biologically independent experiments). ANOVA was performed 
for each SHLD1®° clone versus wild type using Dunnett correction for 
multiple comparisons, P < 107°. Gene-editing efficiencies can be found 
in Supplementary Table 2. i, BRCA1*° and BRCA1°SHLD2*° cells 

were virally transduced with expression vectors for GFP alone or GFP- 
SHLD2. Sensitivity to olaparib (200 nM) was determined by a short-term 
survival assay in the presence of 1 jg ml~' doxycycline to induce protein 
expression. Data are represented as dots for every individual experiment 
with the bar representing the mean +s.d. (n=3). 
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Extended Data Fig. 2 | Shieldin inhibits homologous recombination. 

a, Representative micrographs of RAD51 focus formation in the indicated 
RPE1 cell lines (data quantified in Fig. 2d, n > 3). b, Traffic light reporter 
assay testing RPE1 BRCA1*° cells virally transduced with sgRNAs 
targeting 53BP1 or SHLD3. Data are represented as dots for individual 
experiments with the bar representing the mean 4 
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Gene-editing efficiencies of the sgRNAs can be found in Supplementary 
Table 2. c, Representative flow cytometry plots of cells analysed with 

the traffic light reporter assay (data quantified in Fig. 2e, n > 3). 

d, Representative flow cytometry plots of cells analysed with the traffic 
light reporter assay (data quantified in b). 
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Extended Data Fig. 3 | Mouse shieldin promotes resistance to PARP 
inhibition in Brcal-mutated cells and tumours. a, Clonogenic survival 
assays of transduced KB1P-G3 cells treated with indicated olaparib 

doses + ATM inhibitor (ATMi) KU60019 (500 nM). On day 6, the ATMi 
alone and untreated groups were stopped and stained with 0.1% crystal 
violet; the other groups were stopped and stained on day 9. Data shown are 
representative of 3 biologically independent experiments (with 3 technical 
replicates each). b, Left, quantification of RAD51 focus formation in 
parental KB1P-G3 (Breal~'~;Trp53~'~) cells or KB1P-G3 cells that were 
transduced with the indicated lentiviral sgRNA vectors. Cells were fixed 
without treatment or 4 h after irradiation (10-Gy dose). Each data point 
represents a microscopy field containing a minimum of 50 cells; the bar 
represents the mean + s.d. (n = 15). Right, representative micrographs of 
RAD51-negative and RAD51-positive cells (the latter is indicated by an 
arrowhead). DNA was stained with DAPI. c, Clonogenic survival assay 
of Rosa26CRT2/t; Brcq1V4;p53-null mouse embryonic stem cells virally 
transduced with the indicated sgRNA and treated without or with 15 nM 
olaparib for 7 d. Gene-editing efficiencies of the sgRNAs can be found in 
Supplementary Table 2. Data shown are representative of 3 biologically 
independent experiments (with > 2 technical replicates each). 

d, Clonogenic survival assay of Rosa26CrERT2/Wt. Brcg 15/4 mouse 
embryonic stem cells virally transduced with the indicated sgRNA and 
treated without or with 0.5 .M tamoxifen to induce BRCA1 depletion. 
Gene-editing efficiencies of the sgRNAs can be found in Supplementary 
Table 2. Data shown are representative of 2 biologically independent 
experiments (with 3 technical replicates each). 
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Extended Data Fig. 4 | Shieldin localizes to DSB sites. a, Representative siRNA transfections). d, Whole cell extracts from RPE1 wild-type cells 


micrographs of the experiments quantified in Fig. 3c. b, Representative transfected with the indicated siRNAs were processed for immunoblotting 
micrographs of the experiments quantified in Fig. 3e. c, Quantification with the indicated antibodies. Tubulin is used as a loading control (n= 1 
of mRNAs for SHLD1, SHLD2 and SHLD3. RPE cells were transfected experiment; siRNA efficiency is also monitored by immunofluorescence). 
with siCTRL (non-targeting control siRNA) or siRNA targeting the e, Quantification of 53BP1 and RIF1 recruitment to ionizing radiation- 
indicated shieldin subunits. Forty-eight hours after transfection, mRNA induced DSBs (1 h after irradiation with 10 Gy) following depletion of the 
was purified and reverse-transcribed before being assayed by quantitative indicated shieldin components. Data are represented as the mean +s.d. 
real-time PCR. Data were normalized to the amount of GAPDH mRNA (n= 3, independent siRNA transfections). f, Representative micrographs 
and expressed relative to the corresponding value for cells transfected of the experiments quantified in e. 


with siCTRL. Data are presented as the mean + s.d. (n = 3, independent 
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Extended Data Fig. 5 | Epistasis between 53BP1 and shieldin factors. 

a, Quantification of RAD51 focus formation 3 h after irradiation 

(10 Gy) in RPE BRCA1*° (left), BRCA1*°53BP1*° (middle) and 
BRCA1*°SHLD2*® (right) cells after viral transduction with the indicated 
sgRNAs (editing efficiency can be found in Supplementary Table 2) 

or empty vector (EV). Data are represented as the mean +s.d. (for 
BRCA1*°53BP1™, n = 4 biologically independent immunofluorescence 
experiments; for BRCA1*° and BRCA1*°SHLD2*°, n= 6 biologically 
independent immunofluorescence experiments). P values were calculated 
using a two-tailed unpaired t-test. Left, BRCA1® EV versus sg53BP1-1 
P=0.0002; EV versus sgSHLD1-1 P= 0.0043; EV versus sgSHLD2-2 
P=0.0348; EV versus sgSHLD3-1 P= 0.0180; EV versus sgREV7-1 


P=0.0012). Middle, right: all comparisons to the EV condition were 
non-significant (NS). Values for BRCA1°53BP1*° EV versus sg53BP 1-1 
P=0.2332; EV versus sgSHLD1-1 P=0.4451; EV versus sgSHLD2-2 
P=0.9632; EV versus sgSHLD3-1 P=0.1187; EV versus sgREV7-1 
P=0.0568. Values for BRCA1K°SHLD2®™: EV versus sg53BP1-1 
P=0.0550; EV versus sgSHLD1-1 P=0.1864; EV versus sgSHLD2-2 
P=0.3568; EV versus sgSHLD3-1 P= 0.4641; EV versus sgREV7-1 
P=0.2888. b, Talazoparib sensitivity of wild type or two independent 
SHLD1*° SUM149PT-dox-Cas9 clones (A and D) virally transduced 
with an sgRNA targeting 53BP1 (sg53BP1) or a control non-targeting 
sgRNA (sgCtrl), following induction of Cas9. Data are presented as the 
mean + s.d. (n = 3 biologically independent experiments). 
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Extended Data Fig. 6 | See next page for caption. 
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Extended Data Fig. 6 | The co-localization of shieldin with RIF1 on 
chromatin. a, Representation of the deletion mutants of SHLD2-N used 
in c, d. The orange shading indicates blocks of homology. b, Schematic 
of the LacR-RIF1 chromatin recruitment assay. c, Quantification of the 
experiment shown in d. Colocalization was considered positive when 
the average GFP intensity at the mCherry focus was threefold over 
background nuclear intensity. A minimum of 20 cells were imaged per 
biological replicate (circles); the bar represents the mean + s.d. (n= 3). 
d, Representative images of the data quantified in c. The main focus 

is shown in inset ; scale bar, 10 jum. e-h, Quantification (e, g) and 
representative micrographs (f, h) of overexpressed GFP-SHLD2-N and 
mCherry—LacR-RIF1(1-967) co-transfected into uninduced U2OS- 
FokI cells along with siRNA against shieldin complex subunits after 
processing for mCherry and GFP (e, f) or mCherry and REV7 (g, h) 
immunofluorescence. Colocalization was considered positive when the 
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average GFP or REV7 intensity at the mCherry focus was threefold over 
background nuclear intensity. A minimum of 20 cells were imaged per 
condition (circles); the bar represents the mean + s.d. (n = 3 biologically 
independent experiments). i, Representative images of the data quantified 
in j. The main focus is shown in inset; scale bar, 10 jm. j, Quantification 
of GFP intensity at the mCherry-LacR-RIF 1(1-967) focus, normalized 

to nuclear background. Each data point represents a cell transfected with 
the vector coding for the indicated GFP fusion. The line is at the median. 
The data are an aggregate of three independent experiments with a 
minimum of 20 cells counted (total cells counted: 62, 60 and 61 for GFP, 
GFP-SHLD2-C and GFP-SHLD3, respectively). k, mCherry-LacR-FokI 
colocalization with full-length or N-terminally truncated (A1-50) GFP- 
SHLD2. Mean normalized focus intensity is shown from a total of 59 (full- 
length SHLD2) or 56 (SHLD2 A1-50) cells counted (n= 2 biologically 
independent experiments). 
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Extended Data Fig. 7 | Mapping the architecture of the shieldin 
complex. a, Streptavidin pulldown analysis determining which region 
of SHLD2 associates with the other shieldin subunits. WCEs of 293T 
cells transfected with an expression vectors for Flag-SHLD1, V5- 
SHLD3, GFP-REV7 and Strep/HA-tagged SHLD2, SHLD2-N (residues 
2-420), SHLD2-C (residues 421-904) or empty Strep/HA vector 

(EV) were incubated with streptavidin resin and bound proteins were 
eluted with biotin. WCEs and elutions were analysed by SDS-PAGE 
and immunoblotting with the indicated antibodies. Tubulin was used 
as a loading control. Results are representative set of immunoblots 
from two independent experiments. Asterisk denotes a non-specific 
band. b, Mapping the SHLD3 and REV7 binding sites on the SHLD2 

N terminus through streptavidin pulldowns with different SHLD2 
constructs (detailed in Extended Data Fig. 6a) and immunoblotting. 
Results are a representative of a set of immunoblots from three 
independent experiments. c, Affinity purification of shieldin complex 
components using N-terminally truncated SHLD2 (A1-50) analysed 
by immunoblotting (representative of three independent experiments). 
d, Streptavidin pulldown analysis of SHLD2 association with REV7 and 


SHLD3. 293T cells were transfected with siRNAs and expression vectors 
for epitope-tagged shieldin components as indicated (EV, empty Strep/HA 
vector). WCEs were incubated with streptavidin resin and bound proteins 
were eluted with biotin. WCEs and elutions were analysed by SDS-PAGE 
and immunoblotted with the indicated antibodies. Short and long 
exposures are shown for GFP and V5 immunoblots (n= 1). e, Dependency 
of V5-SHLD3 co-immunoprecipitation with GFP-REV7. 293T cells were 
transfected with siRNAs and expression vectors for epitope-tagged REV7 
and SHLD3 as indicated (EV, empty V5 vector). WCEs were incubated 
with anti-V5 antibody and protein G resin. Bound proteins were boiled 

in SDS sample buffer and analysed by immunoblotting with GFP and 

V5 antibodies (n= 1). f, Association between SHLD3 and RIF1. WCEs 

of 293T cells transfected with an expression vector for unfused GFP (—) 

or GFP-SHLD3 (SHLD3) were incubated with GFP-Trap resin. Bound 
proteins were boiled in SDS sample buffer and analysed by SDS-PAGE and 
immunoblotting against 53BP1 and RIF1. Results are representative of 2 
SHLD3 immunoprecipitations, using SHLD3 fused to GFP (shown here) 
and V5 (shown in Fig. 3g) affinity tags. 
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Extended Data Fig. 8 | Controls supporting the role of shieldin in d, WCEs of the indicated CH12F2-3 clones were probed for AID and 
promoting physiological NHEJ. a, Representative dot plots of the flow 68-actin (loading control) by immunoblotting and were quantified by 
cytometry data obtained (of n = 3 biologically independent experiments) densitometry. Each data point represents a biological replicate; the line 
to assess class switching in Fig. 3h. Class switch recombination was represents the mean +s.d. (n =9 for wild type, n =3 for other samples). 
determined as the percentage of IgA‘ cells following stimulation after e, Random plasmid integration of linearized pcGFP-cl conferring G418 
subtracting the baseline percentage of IgA* cells in the indicated clones resistance. Resistant colonies were quantified after 14 d. Bar represents the 
(values in parentheses). b, c, Epistasis analysis of shieldin and 53BP1 mean + s.d. with wild-type cells set at 100% (left, n =5; right, n= 4 except 
in class switch recombination. The percentage of class switching in SHLD2*° (2.7) n= 3 biologically independent experiments). 
CH12F3-2 wild type, single knockout or double knockout cells (as f, Representative images of the plasmid integration assays quantified in 
indicated) following stimulation is shown. Each data point represents a e. g, Un-irradiated CH12F3-2 clones were immunoblotted for RPA32 
biological replicate; the line represents the mean + s.d. (n = 3). Genomic (also known as RPA2) phosphorylation (a representative set from n =3 


editing efficiencies of the sgRNAs can be found in Supplementary Table 2. biological replicates; data relates to Fig. 3i). 
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Extended Data Fig. 9 | The role of DSB-targeted SHLD2 in the 
suppression of homologous recombination and the mapping of the 
SHLD2-C-SHLD1 complex binding to ssDNA. a, Representative 
micrographs of RPE1 BRCA1*°53BP1*° cells transduced with the 
indicated GFP-fusion proteins, pre-extracted, fixed and stained for RAD51 
and GFP 3 h after ionizing radiation (10 Gy). Protein expression was 
induced for 24 h before exposure to ionizing radiation using 1 jg ml“! 
doxycycline. Data relates to Fig. 4b. Note that owing to the pre-extraction 
required for visualization of RAD51 foci, the visualization of non-FHA- 
tagged SHLD2 is lost. b, SDS-PAGE analysis of purified SHLD2-C- 
SHLD1 complexes. Strep/HA-SHLD2(421-904)-Flag-SHLD1 complexes 
were purified from transiently transfected 293T cells. Concentrations of 
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purified proteins were estimated by Coomassie staining and comparison to 
a standard curve of known BSA concentrations visualized by fluorescence 
at 700 nm. SHLD2-C m1 and SHLD2(S)-C denote SHLD2-C constructs 
carrying the OB-fold m1 mutation and the internal deletion (A655-723) 
corresponding to the naturally occurring splice variant of SHLD2, 
respectively. Open and filled arrowheads mark the bands corresponding to 
SHLD2-C and SHLD1, respectively. EV refers to empty Strep/HA vector. 

A representative stained gel from two independent experiments is shown. 
c, Representative image of the **P-labelled ssDNA EMSA with SHLD2-C- 
SHLD1 for Ky determination shown in Fig. 4e. d, Model of the SHLD2 OB- 
fold domains and the engineered mutations (red spheres, point mutations; 
red ribbons, splice variant deletion). Model relates to Fig. 4b, d. 
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Extended Data Fig. 10 | SHLD2 OB-folds are required for suppression 
of RAD51 focus formation induced by ionizing radiation. 

a, Quantification of RAD51 foci 3 h after 10 Gy irradiation in RPE1 
BRCA1I*°SHLD2*° cells complemented with the indicated GFP-tagged 
SHLD2 constructs via viral transduction. Protein expression was 
induced with 1 1g ml“! doxycycline for 24 h before exposure to ionizing 
radiation. Each data point is a biological replicate; the bar represents the 
mean +s.d. (n=6 for BRCA1® untransduced cells, BRCA 1°SHLD2k° 
untransduced and GFP-SHLD2 cells, n = 3 for remaining cell lines, 
biologically independent experiments). b, Representative micrographs of 
the data shown in a. Note that owing to the pre-extraction required for 
visualization of RAD51 foci, the visualization of non-FHA tagged SHLD2 
foci is lost. c, Representative micrographs of RPE1 BRCAIS°SHLD2*° 
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cells virally transduced with vectors expressing GFP-tagged SHLD2 wild 
type or its OB-fold m1 mutant (m1), or short splice variant (S), 1 h after 

5 Gy ionizing radiation. Scale bar, 10 xm. d, Quantification of the data 
shown in c. Each data point represents an independent biological replicate 
counting > 100 cells. Data are represented as mean +s.d. (n=3). e, WCEs 
of 293T cells co-transfected with Strep/HA-SHLD2 wild type, Strep/ 
HA-SHLD2 m1 or Strep/HA-SHLD2(S) mutants, and other shieldin 
subunits (Flag-SHLD1, V5-SHLD3, and GFP-REV7) were incubated with 
streptavidin resin and bound proteins were eluted with biotin. WCEs and 
eluted proteins were visualized by SDS-PAGE and immunoblotting with 
the indicated antibodies. Results shown are a representative set from two 
independent experiments.Source Data 
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33BP1 cooperation with the REV7-shieldin complex 
underpins DNA structure-specific NHEJ 
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Sven Rottenberg®, Richard J. Cornall?, Catherine M. Green? & J. Ross Chapman!* 


53BP1 governs a specialized, context-specific branch of the 
classical non-homologous end joining DNA double-strand break 
repair pathway. Mice lacking 53bp1 (also known as Trp53bp1) are 
immunodeficient owing to a complete loss of immunoglobulin class- 
switch recombination’”, and reduced fidelity of long-range V(D)J 
recombination*. The 53BP1-dependent pathway is also responsible 
for pathological joining events at dysfunctional telomeres‘, and 
its unrestricted activity in Brcal-deficient cellular and tumour 
models causes genomic instability and oncogenesis”. Cells that 
lack core non-homologous end joining proteins are profoundly 
radiosensitive®, unlike 53BP1-deficient cells”!°, which suggests that 
53BP1 and its co-factors act on specific DNA substrates. Here we 
show that 53BP1 cooperates with its downstream effector protein 
REV7 to promote non-homologous end joining during class-switch 
recombination, but REV7 is not required for 53BP1-dependent 
V(D)J recombination. We identify shieldin—a four-subunit 
putative single-stranded DNA-binding complex comprising REV7, 
20orf196 (SHLD1), FAM35A (SHLD2) and FLJ26957 (SHLD3)— 
as the factor that explains this specificity. Shieldin is essential for 
REV7-dependent DNA end-protection and non-homologous 
end joining during class-switch recombination, and supports 
toxic non-homologous end joining in Brca1-deficient cells, yet is 
dispensable for REV7-dependent interstrand cross-link repair. 
The 53BP1 pathway therefore comprises distinct double-strand 
break repair activities within chromatin and single-stranded DNA 
compartments, which explains both the immunological differences 
between 53bp1- and Rev7- deficient mice and the context specificity 
of the pathway. 

53BP1-dependent non-homologous end joining (NHEJ) requires 
the participation of several downstream factors? !)12, including REV7, 
which is a non-catalytic subunit of the translesion synthesis DNA pol- 
ymerase-C. REV7 mediates genotoxic NHEJ events in Brca1-deficient 
mouse mammary tumour cells, and prevents end-resection at double- 
strand breaks (DSBs) associated with class-switch recombination 
(CSR). We generated a conditional Rev7 knockout mouse (Rev7A/) 
in a C57BL/6 background, after germline Rev7-deletion resulted in 
embryonic lethality (Extended Data Table 1). Rev7/ mice that possess 
the B cell Mb1-cre deleter allele!* showed Rev7 deletion at the start of 
the B cell lineage (Extended Data Fig. 1a), and normal B cell num- 
bers with undetectable REV7 protein (Extended Data Fig. 1b). Rev7!f 
Mb1*' mice expressed normal serum IgM but reduced titres of IgA 
and IgG, which is suggestive of CSR failure (Fig. 1a). Defective in vivo 
CSR was confirmed in immunization experiments in which serum 
concentrations of antigen-specific IgG1 in Rev7“/Mb1*'""* mice were 
about 10-fold lower than in control Rev7+!+Mb1+/“" mice, whereas 
IgM responses were comparable between groups (Fig. 1b). Likewise, 
Rev7-deletion severely compromized—by up to 90%—the production 
of class-switched B splenocytes upon stimulation in culture (Fig. 1c and 


Extended Data Fig. 1c), without affecting cell proliferation (Fig. 1d). 
Equivalent CSR frequencies in Rev7“/Mb1*!", 53bp1—'—Mb1*!" and 
53bp1~!~Rev#IMb1*'“ double knockout cells furthermore confirmed 
REV7-53BP1 cooperation is essential for CSR (Fig. le). 

However, notable differences in the absolute numbers of B lineage 
cells were detected between Rev7JMb1+/? and 53bp1~'~Mb1*'" 
mice (Fig. 1f). Though 53bp1-deficient mice showed 50% and 29% 
reductions in B220* B lymphocytes in the bone marrow and spleen, 
respectively, these abnormalities were absent from Rev7/Mb1*! ani- 
mals (Fig. 1f), leading us to question whether 53BP1-dependent DNA 
repair activities during B cell development require REV7. Detailed bone 
marrow analysis showed that B lymphocytes of 53bp1~/~ Mb1*!" mice 
became progressively depleted, with losses of approximately 70% of 
total lymphocytes by the late small pre-B and immature B cell stages 
(Hardy fractions D and E, respectively; Fig. 1g and Extended Data 
Fig. 2a). Losses were accompanied by increased apoptosis in bone mar- 
row and follicular splenic B cell fractions (Extended Data Fig. 2b, c). 
By contrast, Rev7f{Mb1*!"e mice showed normal B cell counts and 
apoptotic indices (Fig. 1g and Extended Data Fig. 2a—c), despite the 
complete absence of REV7 protein in B220*CD43°* pro-B progenitors 
(Extended Data Fig. 2d). To exclude the possibility that developmen- 
tal defects in Rev7/Mb1*'" mice could be masked by compensatory 
changes, we generated chimaeric mice with mixed bone marrow cells. 
Equal mixes of CD45.1 wild-type and CD45.2 Rev#/Mb1*!“? whole 
bone marrow cells, or CD45.1 wild-type and CD45.2 53bp1~/~Mb1"' 
whole bone marrow cells, were injected intravenously into lethally 
irradiated CD45.1 wild-type recipient mice (Extended Data Fig. 2e). 
Eight weeks after bone marrow transfer, the reconstitution of pro-, 
pre-, immature and mature bone marrow B cells derived from CD45.2 
Rev7/fMb1*! cells did not differ from that of wild-type CD45. 1 cells, 
similar to that in control recipient mice (data not shown). However, 
bone marrow B cells derived from CD45.2 53bp1~/~- Mb1*!*"" mice 
only reconstituted about 30% of pro-B cell fractions, and were further 
outcompeted by wild-type CD45.1 bone marrow B cells at later stages, 
at which point they made up only about 15% of total immature and 
mature fractions (Extended Data Fig. 2e). These findings confirmed 
that B cell development is normal in the absence of REV7, which sug- 
gests that V(D)J recombination—which is essential for this process— 
does not require REV7. 

To directly test whether 53BP1-dependent NHEJ during V(D)J 
recombination and CSR could be distinguished at the level of REV7 
involvement, we monitored the stability of the Igh and Ig )-light-chain 
(Igl) loci in stimulated B splenocytes from Rev7/Mb1t!, 53bp1~'— 
and control (Mb1+/“*) mice. Upon stimulation with anti-CD40 plus 
IL-4 in vitro, NHEJ-deficient splenic B cells accumulate Ig/ breaks 
owing to abortive RAGI- and RAG2-dependent secondary V-J 
recombination events, as well as Igh breaks associated with CSR". 
Chromosome breakage at Igh and Ig] was monitored by four-colour 
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Fig. 1 | REV7 and 53BP1 cooperate during CSR, yet are functionally 
uncoupled in V(D)J recombination. a, Serum immunoglobulin in 
Rev7t!+ Mb1*!*? and Rev7#JMb1*!? mouse cohorts. n= 11 mice per 
genotype; P values, unpaired two-tailed t-test. Mean + 95% confidence 
interval. b, NP-specific serum IgM (left) and IgG1 (right) at indicated 
times after NP-CGG immunization. AU, arbitrary units. Representative 
data, n =2 independent experiments, each with 4 mice. Mean + 95% 
confidence interval. c, Cell trace violet (CT V)-labelled splenic B cells 
were stimulated as indicated and stained for surface IgG1 or IgE on 
day 4. Representative data, n > 6 mice. d, CTV dilution in purified B 
cells cultured in the presence of LPS and IL-4 for 96 h. Representative 
data, n > 6 mice. e, Splenic B cells cultured with the indicated stimuli 
(96 h) and stained for surface IgG1, IgE, IgG2b or IgG3. n=4 mice per 
genotype. CSR 100%, mean immunoglobulin isotype switch frequency 
of 2 control animals in each experiment. P values, two-way ANOVA 
with Tukey’s correction. Mean + 95% confidence interval. f, Absolute 
numbers of B220* B cells in the bone marrow (one femur plus one 


fluorescence in situ hybridization on metaphase spreads, using 
probes that were positioned centromeric and telomeric to these loci)’. 
Metaphases were classed as abnormal if either locus had bi-allelic 
centromeric BAC signals, with one allele lacking an adjacent signal 
from the telomeric BAC (Fig. 1h). As expected, 53bp1~/~ B cells had 
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tibia) and spleen. n =9 mice per genotype, except Rev7*/+Mb1t!" 
(n=8). P values, unpaired two-tailed t-test, Mean + 95% confidence 
interval. g, Absolute numbers of B cell precursors (Hardy”® fraction A, 
B220+CD43*BP-1~ CD24; Hardy fraction B, B220*CD43* BP-1~ CD24"; 
Hardy fraction C, B220*CD43*BP-1*CD24*; Hardy fraction D, 
B220*CD43- IgM IgD~; Hardy fraction E, B220*CD43-IgM*IgD-; and 
Hardy fraction F, B220*CD43~IgM*IgD*) in the bone marrow (one 
femur and one tibia) from Rev7+'+ Mb1+!* (n= 8), RevIMb1*!" (n=9) 
and 53bp1—'— Mbt" (n=9) mice. P values, unpaired two-tailed t-test. 
Mean + 95% confidence interval. h, Top, schematic of the Igh and Igl loci 
and fluorescence in situ hybridization probes. Bottom, representative 
metaphase images showing normal and abnormal Igh and Igl loci. C, 
centromere-proximal; T, telomere-proximal. i, Igh and Igl locus breakage 
in splenic B cells of indicated mice upon stimulation (anti-CD40 + IL-4) 
for 96 h. n= 4 mice per genotype, between 98 and 151 metaphases were 
analysed from each mouse, except for one wild-type sample with only 45 
metaphases; multiple comparisons, one-way ANOVA. Mean + s.d. 


high levels of chromosomal abnormalities at both the Igh and Ig! loci, 
unlike control cells for which locus breakage was rare (Fig. 1i). Three 
out of four Rev7/Mb1*!“ mice exhibited breakage frequencies at the 
Igh locus equivalent to those in 53bp1~'~ mice. By contrast, [gl abnor- 
malities in Rev7/Mb1*' cells occurred at near-control frequencies, 
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Fig. 2 | Mutational screening reveals REV7 interaction surfaces 
essential for CSR and resection inhibition. a, Schematic of the screen. 
Rev7 single and combinatorial point mutant alleles were stably expressed 
in Rev7-'~ CH12-F3 cells. IgM-to-IgA CSR was measured 40 h after 
stimulation with anti-CD40 antibody, IL-4 and TGF@1. b, Quantification 
of IgM-to-IgA CSR in Rev7~'~ CH12-F3 cell lines complemented as 
indicated. IgA switching efficiency normalized against wild-type REV7 
complemented cells. Mutants indicated in red were excluded from further 


consistent with a selective requirement for 53BP1 but not REV7 in 
V(D)J recombination. 

To determine the function of REV7 in CSR, we deleted Rev7 in 
CH12-F3, a mouse B lymphoma cell line that undergoes efficient 
switching from IgM to IgA upon stimulation in vitro. CSR frequen- 
cies in Rev7-/~ CH12-F3 cells were increased approximately 10-fold by 
reconstitution of REV7 expression compared to a GST control (Fig. 2a), 
and more than 20 REV7 single or combinatorial point mutants were 
screened for their ability to restore CSR (Fig. 2b and Extended Data 
Fig. 3a, b). Mutations that disrupt protein interactions!®!’, post- 
translational modifications or a destruction-box degron’® were 
selected, and structure-led conservation analysis guided the generation 
of mutants within putative protein-interaction surfaces (summarized 
in Extended Data Table 2). Stable CSR-defective REV7 mutants were 
re-tested for their ability to support CSR to IgG1 upon complementa- 
tion in stimulated B cells from Rev7/{Mb1t!* mice (Fig. 2c, d). Two 
mutants, REV7(Y63A) and REV7(K1294A), failed to rescue CSR to 
levels above those of control-complemented cells in both screens. The 
capacity of these mutants to prevent the hyper-resection of DSBs in the 
Igh switch region, which is a central function of the 53BP1-RIF1-REV7 
axis during CSR®!!"'3°, was therefore assessed by replication protein A 
(RPA)-single-stranded DNA (ssDNA) chromatin immunoprecipita- 
tion (ChIP) across Igh and control loci. As expected!’ DSB-associated 
donor (Sj) and acceptor (Sa) switch loci, but not non-targeted Igh 
(Sy1) or control (Rpp30) loci, exhibited aberrant RPA-ssDNA enrich- 
ments in Rev7~/~ cells (Extended Data Fig. 3c). RPA enrichments 
were suppressed in cell lines expressing wild-type REV7 but not the 
REV7(Y63A) or REV7(K129A) mutants, which confirms these muta- 
tions compromize resection inhibition (Extended Data Fig. 3c). 

REV7 Tyr63 is one of two evolutionarily conserved residues that 
mediate interactions between the C-terminal ‘safety-belt’ domain and 
two conserved REV7-binding motifs in REV3L (RBM, and RBM;; con- 
sensus PxxxpPSR)!”°, each of which is essential for REV3L-dependent 
resistance to interstrand cross-link damage’. Alanine substitutions 
at Tyr63 or Trp171 in REV7 blocked interactions with the RBM, of 
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analysis owing to unstable REV7 protein expression. n = 3 independent 
experiments. Mean +s.d. c, CSR in stimulated B cells collected from Revit 
Mb1*'" mice and infected with retrovirus expressing GST (control) 

or indicated REV7 protein. n =6 mice per genotype. IgG1* events in 
infected cells as a proportion (%) of the mean IgG1* events in two wild- 
type REV7-complemented controls. n = 3 experiments (each with 2 mice 
per genotype). Mean +s.d. d, Representative flow cytometry plots of data 
shown inc. 


REV3L, in a yeast two-hybrid (Y2H) assay (Extended Data Fig. 4a). As 
expected”!, Rev7~—'~ cells were sensitive to the interstrand cross-link 
inducer MMC, first accumulating in G2/M phase, and then progress- 
ing into cell death marked by an accumulation of sub-G1 events and 
profound chromosomal instability (Extended Data Fig. 4b-d). These 
defects were completely suppressed by complementation with wild- 
type REV7 but not the REV7(Y63A) or REV7(W171A) mutants, or 
the REV 1-binding mutants REV7(L186A) or REV7(Y202A) (Extended 
Data Fig. 4b, e). By contrast, the CSR-deficient REV7(K129A) mutant 
completely restored wild-type responses to MMC, consistent with the 
unperturbed interaction between this mutant and REV3L (Extended 
Data Fig. 4a, e). This separation of function between REV7 interstrand 
cross-link repair and NHEJ activities implicates distinct REV7 com- 
plexes in NHEJ. 

We therefore immunopurified control and Flag-HA-REV7 com- 
plexes from lysates of stably complemented Rev7'~ CH12-F3 cells, and 
analysed these by liquid chromatography-tandem mass spectrometry 
(LC-MS/MS). Aside from the known interactors REV3L and GTEF2I”, 
three uncharacterized proteins were highly enriched with REV7 (Fig. 3a 
and Table 1). The genes that encode these proteins, c20orf196, FAM35A 
and FLJ26957 (also known as CTC-534A2.2)—renamed SHLD1, 
SHLD2 and SHLD3, respectively—were cloned and the corresponding 
human protein was screened for interaction with wild-type and mutant 
REV7 by Y2H. In this assay, only SHLD3 showed direct interaction 
with REV7; this was abolished by REV7(Y63A) but was unaffected by 
REV7(W171A) (Fig. 3b and Table 1), which correlates perfectly with 
the requirement for Tyr63 but not Trp171 in NHEJ. SHLD3 comprises 
two N-terminal REV3L-like RBM motifs, and predicted structural folds 
that resemble the mRNA cap-binding domain of the translational elon- 
gation initiation factor EIF4E (Fig. 3c; folds recognized by Phyre2”4). 
REV7-SHLD3 interactions rely predominantly on Pro53 and Pro58 
in RBMz (Fig. 3c), which is a motif that resembles RBM, in REV3L 
(Extended Data Fig. 5a). 

The CSR defects exhibited by REV7(K129A) could not be explained 
by a failure to interact with SHLD3 (Fig. 3b). Lys129 is a highly 
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Fig. 3 | REV7 interacts with SHLD3 and SHLD1-SHLD2 via distinct 
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conserved residue on an uncharacterized structural surface that was 
selected upon analysis of the REV7 crystal structure’® (Extended Data 
Fig. 5b). A mutant containing a conservative arginine substitution of 
Lys129 completely restored wild-type CSR frequencies (Fig. 2b and 
Extended Data Fig. 3b), which eliminated consideration of post- 
translational modification. Therefore, we compared the compositions of 
the Flag~-HA-REV7(K129A) and wild-type Flag-HA-REV7 complexes 
using LC-MS/MS and label-free quantification (LFQ). Consistent with 
Y2H data, Flag-HA-REV7(K129A) complexes retained equivalent lev- 
els of REV3L and SHLD3 but were devoid of SHLD1-SHLD2 (Fig. 3d 
and Table 1), implicating SHLD1 and SHLD2 in NHEJ. 

Isogenic knockout CH12-F3 clones were therefore generated for 
Shld1, Shld2 and Shid3. No changes in REV7 or 53BP1 expression 


Table 1 | Label-free quantification of LC-MS/MS results for 
indicated interacting proteins 


Flag-HA-REV7 (wild type) 


Number of 
unique Coverage Y2H interaction 
Protein symbol peptides (%) LFQ? LFQ® with REV7 
REV7 (MAD2L2) 9 42 5:1 —0.5 NA 
c200rf196(SHLD1) 5 34 5.7 -78 _ 
FAM35A (SHLD2) 15 26 6.9 —6.6 _ 
FLJ26957 (SHLD3) 8 27 4.3 -101 4+ 
REV3L 8 3 1.8 0.02 + 
GTF2l 10 12 2.7 0.95 ND 


Interacting proteins were determined as in Fig. 3a, d. NA, not applicable; ND, not determined. 
n=2 independent experiments. 

@Average loga(fold change(wild type/control)), n=2. 

>Average loga(fold change(K129A/wild type)), n=2. 
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Experiment 1 
and positions of alanine substitutions in each mutant protein. Bottom, 
interactions between REV7 and indicated SHLD3 proteins by Y2H. 
Representative data (n = 4). d, LC-MS/MS and LFQ analysis of Flag— 
HA-REV7 and Flag-HA-REV7(K129A) interactomes. The scatter plot 
depicts log, fold-enrichment of interacting proteins across 2 independent 
experiments. WT, wild type. 


were identified in these knockout cell lines (Extended Data Fig. 6a), 
which suggests a direct role for a complex formed by REV7 and all 
three SHLD proteins in NHEJ. Normal proliferation but diminished 
CSR was detected in all clones (Fig. 4a, b and Extended Data Fig. 6b). 
CSR was restored in Shld3~'~ and Shld2~~ cells by integration of Shld3 
and Shid2 transgenes, respectively (Fig. 4c and Extended Data Fig. 6c), 
and mutant Sh/d3 transgenes revealed that RBM, and RBM) support 
CSR redundantly (Fig. 4c. and Extended Data Fig. 6d). Shid3 deletion 
did not augment CSR defects in Rev7~'~ CH12-F3 cells (Extended Data 
Fig. 6e), confirming epistasis between Sh/d3 and Rev7 in NHEJ. These 
data indicate REV7 forms the linchpin in a four-subunit protein com- 
plex, the integrity of which is essential for 53BP1-dependent CSR. The 
C-terminal safety-belt of REV7 mediates interaction with SHLD3, and 
residues centred around Lys129 mediate binding to SHLD1-SHLD2 
heterodimers. Structural modelling of SHLD2 revealed a triple- 
tandem OB-fold architecture with structural homology to the core 
ssDNA-binding OB-folds 2-4 in fungal RPA70”* (Extended Data 
Fig. 5c, d). We and others”* have named this putative ssDNA-binding 
complex ‘shieldin’ and its subunits as SHLD1, SHLD2 and SHLD3, by 
analogy with the telomere end-capping complex shelterin, the ssDNA 
and double-stranded DNA (dsDNA) binding proteins of which coop- 
erate in chromosome end protection”®. 

RPA-ssDNA ChIP experiments confirmed the contribution of 
shieldin to DNA-end protection: RPA~ssDNA complexes were unde- 
tectable at donor (Sx) and acceptor (Sq) loci in stimulated cultures 
of wild-type and SHLD3-complemented Shld3~'~ CH12-F3 cells, 
yet highly enriched in Rev7~/~, Shld2~'~ and Shld3~'~ cells (Fig. 4d 
and Extended Data Fig. 6f, g). Thus, REV7-shieldin inhibits DSB 
resection during NHEJ. However, Shld3~'~ cell lines were indistin- 
guishable from control upon MMC treatment, consistent with REV7 
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Fig. 4 | The REV7-shieldin complex mediates 53BP1-dependent NHEJ. 
a, Representative flow cytometry plots for IgM-to-IgA CSR in indicated 
mutant CH12-F3 cell lines. Representative data, n > 3 independent 
experiments. b, IgM-to-IgA CSR frequencies in knockout CH12-F3 cell 
lines. n=5 (Shld3~'~), n=4 (Shld1~'~) and n=3 (Shld2~'~) independent 
experiments. Mean + s.d. c, IgM-to-IgA CSR in indicated CH12-F3 

lines stably transduced with either control (GST), wild-type or mutant- 
SHLD3-expressing transgenes. Data normalized to CSR frequencies of 
wild-type CH12-F3 cells. n =5 independent experiments. Mean + s.d. 

d, RPA-ssDNA ChIP with indicated CH12-F3 lines stimulated with 
anti-CD40 antibody, IL-4 and TGF@1 (30 h). Representative data, n =2 
independent experiments. Bars indicate mean. e, Clonogenic assay using 


separation-of-function in interstrand cross-link repair and NHEJ 
(Extended Data Fig. 6h). 

We next investigated the contribution of shieldin to 53BP1- 
dependent toxic NHEJ in Brcal-deficient cells. CRISPR-Cas9-mediated 
mutagenesis of Shld3 in the Brcal~'~p53~'~ mouse mammary tumour 
KB1P-G3 cell line’? was strongly selected for in the presence of the 
poly(ADP-ribose) polymerase inhibitor (PARPi) olaparib (Extended 
Data Fig. 6i). Moreover, PARPi resistance in Shld3—'— KBP1-G3 
cells was equivalent to Rev7 ~~ KBP1-G3 controls (Fig. 4e, f). There 
was neither selection for mutagenesis of Shid3 or Rev7 loci in olap- 
arib-treated 53bp1-/— KB1P-G3 cells, nor increased resistance in 
Shld3~'~53bp1~'~ cells (Fig. 4f and Extended Data Fig. 6j, k). Similar 
reductions in olaparib-induced radial chromosomes in both Shld3~'~ 
and 53bp1~'~ KB1P-G3 lines confirmed toxic NHEJ in Brca1-deficient 
cells is dependent on shieldin (Fig. 4g). Shieldin disruption therefore 
provides a possible route to PARPi resistance in cancer. 

This study reveals a requirement for a putative ssDNA-binding pro- 
tein complex, shieldin, in the 53BP1 pathway, indicating that multiple 
DNA binding activities cooperate during DNA structure-specific NHEJ 
(Fig. 4h). The coupling of 53BP1-dependent anti-resection activities 
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KB1P-G3 cells following deletion of indicated genes. sgRNA, gene targeted 
by CRISPR-Cas9 guide RNA. f, Quantification of data in e and Extended 
Data Fig. 6k. n = 3 independent experiments. Mean +s.d. g, Metaphases 
prepared from indicated cell lines following 24-h control or olaparib 
treatment were analysed for the presence of radial chromosomes. n= 3 
independent experiments (each 50 metaphases per condition). Mean + s.d. 
h, Proposed model of shieldin-53BP1 cooperation during NHEJ. The 
stabilization of ssDNA overhangs or ssDNA-dsDNA junctions at DSB sites 
by REV7-SHLD1-SHLD2-SHLD3 complexes promotes DSB resolution 
activities during NHEJ. The absence of lymphocyte development defects 
in Rifl~/~ mice’ suggests RIF1 could link 53BP1 anti-resection activities in 
chromatin to shieldin activities in ssDNA compartments (arrow). 


within chromatin to shieldin-dependent stabilization of ssDNA- 
tailed ends may permit the conversion of ssDNA-tailed substrates, 
such as those generated during CSR, into DSBs amenable to NHEJ. 
This segregation of distinct yet cooperative activities at DSB sites can 
explain the developmental differences between 53BP1- and REV7- 
deficient lymphocytes: ssDNA-tails of AID-dependent DSBs induced 
during CSR require these specialized activities to orchestrate their 
joining by NHEJ’’, whereas the absence of ssDNA at RAG-induced 
DSBs produced during V(D)J recombination could preclude the 
need for shieldin. ssDNA-tailed DSBs additionally exist at uncapped 
telomeres, and potentially at collapsed replication forks, and thus a 
requirement for shieldin during the repair of these structures can 
explain the physiological and pathophysiological specificities of the 
53BP1 system. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0362-1. 
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METHODS 


Mice. All mice used for this study were generated on, or backcrossed onto a 
C57BL/6 background (>10 generations). Sperm from mice containing the 
Mad2]2'14(EUCOMM) Msi nockout-first conditional allele for Rev7 (MGI:4432091, 
a gift from D. Adams, Wellcome Sanger Institute, Hinxton), was used to re-derive 
Mad2]2'ma(EUCOMM)Wisi/+ mice, which were then bred with constitutive Flp-deleter 
mice (Tg(ACTB-Flipe)9205Dym; Jax stock 005703) to generate mice with the 
Mad2I2'""* conditional allele. Experimental Rev7/(Mb1*!" (Mad2l2'"™!/"!5) and 
related control mice were generated by intercrossing with mice containing the B cell 
lineage Mb1-cre deleter strain (Cd79a"™!() Reh, MGI:3687451'*), All experiments 
involved age-matched 8-16-week-old animals on an inbred C57BL/6 background. 
53bp1~/~ mice (MGI:2654201; also C57BL/6) were generated and described else- 
where”’. For bone marrow mixed chimaera experiments, wild-type C57BL/6-Ly5.1 
mice (hereafter known as CD45.1; Jax stock 002014) were irradiated with two doses 
of 4.5 Gy spaced 3 h apart, and subsequently injected with 5-10 x 10° bone marrow 
cells (approximately 50:50 mixture) of wild-type CD45.1* bone marrow and either 
wild-type Mb1+!", Rev7{Mb1+! or 53bp1~!~ Mb1*+!" (CD45.2*) bone marrow. 
Baytril 10% antibiotic solution (Vet Services, University of Oxford) was added to 
drinking water (1.5 ml to 250 ml water) for 3 weeks after irradiation. Mice were 
allowed to reconstitute for 7-8 weeks before analysis. Sample sizes were deter- 
mined by power calculations. We were not pursuing lower penetrance phenotypes, 
thus statistically significant data could typically be obtained with small group sizes 
(typically, 4-10 mice). Randomization of samples was only undertaken during the 
scoring of chromosomal aberration during metaphase analyses. Mice of a certain 
genotype were selected based on a unique mouse ID number that does not indicate 
mouse genotype, thus phenotype-genotype relationships were determined only at 
the data analysis stage. All experiments were approved by the University of Oxford 
Ethical Review Committee and performed under a UK Home Office Licence in 
compliance with animal use guidelines and ethical regulations. 

Immunizations. Mb1*!, Rev7{Mb1*", or Rev7’~Mb1*! mice were immu- 
nized intraperitoneally with 50 mg of NP-CGG (Santa Cruz Biotechnologies) emul- 
sioned in Imject Alum adjuvant (Pierce, Thermo Fisher Scientific). Blood samples 
were collected from the tail vein at 0, 7, 14, 21 and 28 days after immunization. 
Enzyme-linked immunosorbent assays. Enzyme-linked immunosorbent assays 
(ELISAs) were used to quantify the production of NP-specific antibodies in mice 
serum. Ninety-six-well plates were coated with 1 ug/ml NP-BSA (Biosearch 
Technologies) in bicarbonate buffer, blocked with 5% milk in PBS and incubated 
with serial dilutions of serum collected at different time points from immunized 
mice. Plates were probed using alkaline phosphatase-coupled antibodies against 
mouse IgM and IgG1 (Southern Biotech). Phosphatase substrate (Sigma) was used 
for detection and optical density measured at 405 nm. For IgG1, pooled blood from 
post-immunization wild-type mice was used as a standard and serially diluted into 
a standard curve. The first dilution was established as 1,000 arbitrary units. For IgM 
pooled blood from day 7 was used as a standard. Immunoglobulin concentrations 
in mouse serum or culture supernatants were determined by sandwich ELISA. 
Total IgG, IgM and IgA was measured with mouse IgG, IgM and IgA ELISA kits, 
respectively (Bethyl Laboratories), according to the manufacturer’s instructions. 
Mouse serum with known immunoglobulin concentrations of each immunoglob- 
ulin was used as a standard. 

B lymphocyte analysis and flow cytometry. Cell suspensions from bone marrow 
(one femur and one tibia) and spleen were counted on a haemocytometer and 
stained with anti-mouse antibodies against the following antigens as appropriate 
(all from BioLegend unless otherwise stated) in FACS buffer (PBS with 2% BSA 
and 0.025% sodium azide): IgD (1:500, 405716 or Thermo Fisher 12-5993-82; 
11-26c.2a), IgG1 (1:200, 406606; RMG1-1), IgM (1:500, 406506 or 406514; RMM-1), 
B220 (1:500, 103232, 103244, 103206, or 103212; RA3-6B2), BP-1 (1:200, 108308; 
6C3), CD19 (1:500, 115534 or 115520; 6D5), CD24 (1:1000, 101827 or BD 
Pharmingen 562563; M1/69), CD93 (1:200, 136510; AA4.1), CD23 (1:200, BD 
Pharmingen 553139; B3B4), CD43 (1:200, BD Pharmingen 562865; S7, or Thermo 
Fisher 11-0431-85; eBioR2/60), and CD21 (1:500, BD Biosciences 563176; 7G6). 
Mouse BD Fc Block (1:500, BD Pharmingen 553141) was added to block non- 
specific binding and live/dead cells were discriminated after staining with Zombie 
Aqua viability dye (1:200, 423102) or Zombie NIR viability dye (1:500, 423105). 
Bone marrow cells from mixed chimaeras were also stained against CD45.1 
(1:500, 110730; A20) and CD45.2 (1:500, 109818; 104). Data were acquired on a 
FACSCanto (BD Biosciences), SH800 (Sony), or MoFlo Legacy (Beckman Coulter) 
and were analysed with FlowJo Software v10 (Tree Star). Gating strategies for bone 
marrow (left panel) and spleen (right panel) are shown in Supplementary Fig. 2; 
further Hardy fraction gating strategies are shown in Extended Data Fig. 2a. 

Ex vivo B splenocyte culture, stimulation and flow cytometry. B cells were 
purified from red blood cell-lysed single-cell suspensions of mouse spleens by 
magnetic negative selection using a B Cell Isolation Kit (Miltenyi Biotec, 130- 
090-862). B cells (3 x 10° per well in a 12-well plate) were cultured in RPMI 
supplemented with 10% FCS, 100 U/ml penicillin, 100 ng/ml streptomycin, 2 mM 


L-glutamine, 1x MEM nonessential amino acids, 1 mM sodium pyruvate and 50 1M 
8-mercaptoethanol. B cells were stimulated with 5 jug/ml LPS (Sigma, L7770- 
1MG), 10 ng/ml mouse recombinant IL-4 (Peprotech, 214-14-20), and agonist 
anti-CD40 antibody (0.5 j1g/ml; Miltenyi Biotec; FGK45.5). Cultures were grown 
at 37°C with 5% CO, under ambient oxygen conditions. Four days after seed- 
ing, stimulated B cells were analysed using a FACSCanto; analysis was performed 
using FlowJo. Cells were resuspended in FACS buffer, blocked with Mouse BD Fc 
Block, and immunostained with biotinylated antibodies as follows: anti-mouse 
IgG1 (1:100, BD Pharmingen 553441; A85-1), anti-mouse IgG2b (1:100, BioLegend 
406704; RMG2b-1), anti-mouse IgG3 (1:100, BD Pharmingen 553401; R40-82) 
and Streptavidin APC (1:500, Thermo Fisher 17-4317-82). Cells expressing IgE 
were assessed using anti-Mouse IgE PE (1:200, BioLegend, 406908; RME-1). Live/ 
dead cells were discriminated after staining with Zombie Aqua viability dye. Cell 
proliferation was assessed using Cell Trace Violet according to manufacturer's 
instructions (CellTrace, Life Technologies). 

Primary B cell reconstitution. Mature untouched B cells were purified as above 
and stimulated with LPS (5 j1g/ml, Sigma, L7770) and mouse recombinant IL-4 
(10 ng/ml, Peprotech, 214-14-20). Cultures were grown at 37°C with 5% CO, 
under ambient oxygen conditions. Filtered retroviral supernatants, collected 48 
h after co-transfection of BOSC23 cells with 7 jug pCL-Eco and 7 jug pMX-IRES- 
GFP-derived plasmids, were used to infect LPS/IL-4-stimulated B cell cultures 
in the presence of polybrene (2.5 jig/ml) and HEPES (20 mM) by spinoculation 
(850g for 90 min at 30°C). After a rest period of 4 to 6 h, viral supernatants were 
removed, and replaced with LPS/IL-4-supplemented culture medium. Three days 
later, surface IgG1 expression was determined in populations of gated cells that 
were positive for the expression of an eGFP retroviral reporter. 

CH12-F3 cell culture and CRISPR-Cas9 editing. All CH12-F3 cell lines were 
cultured in RPMI supplemented with 5% NCTC-109 medium, 10% FCS, 100 U/ml 
penicillin, 100 ng/ml streptomycin and 2 mM t-glutamine at 37°C with 5% CO 
under ambient oxygen conditions. Rev7~/~, c20orf196~!~ (Shld1~'~), Fam35a 
(Shld2~'~) and Flj26957 (Shld3~'~) CH12-F3 were generated using CRISPR-Cas9. 
In brief, gene-specific sgRNAs (sgRNA sequences in Extended Data Table 3a) were 
cloned in modified pX330 (Addgene #42230) or pX458 vectors (Addgene #48138). 
CH12-F3 cells were nucleofected (Amaxa Nucleofector 2b, Lonza) with 2 jg of 
plasmid and Cell Line Nucleofector Kit R (Lonza), using program D-023. Isogenic 
cell clones were isolated by limiting dilution (pX330) or GFP sorting (pX458) single 
cell into 96-well plates. Clones bearing bi-allelic indel mutations were identified by 
native PAGE resolution of PCR amplicons corresponding to edited loci (amplicon 
primer sequences in Extended Data Table 4b), and gene disruption subsequently 
confirmed by Sanger sequencing (sequencing results in Extended Data Table 3b). 
Where antibodies were available, effective target protein ablation was confirmed by 
immunoblotting. Complemented cell lines were generated by lentivirus-mediated 
transduction, using viral supernatants collected from 293T cells co-transfected 
with third generation packaging vectors and pLenti-PGK-PURO-DEST (Addgene 
#19068) or pLenti-PGK-Flag-HA-PURO-DEST vectors containing cloned 
transgene inserts. Typically, cells were spinoculated with polybrene (8 j1g/ml) 
and HEPES (20 mM)-supplemented viral supernatants (1500 rpm, 90 min at 
25°C). Stable cell-lines were subsequently selected and maintained in the presence 
of puromycin (1 1g/ml). To stimulate CSR to IgA, CH12-F3 cells were stimulated 
with agonist anti-CD40 antibody (0.5 ,1g/ml; Miltenyi Biotec; FGK45.5), mouse 
IL-4 (5 ng/l; R&D Systems) and TGF@1 (2.5 ng/l; R&D Systems). Cell-surface 
IgA expression was determined by flow cytometric staining with anti-mouse IgA- 
FITC antibody (Thermo Fisher; 11-4204-82; MA-6E1). CH12-F3 proliferation was 
monitored by dye dilution using carboxyfluorescein succinimidyl ester (CFSE) 
according to manufacturer’s instructions (CellTrace; Life Technologies). In cell 
cycle experiments, CH12-F3 cells pulse-treated with BrdU for 30 min before fixa- 
tion in 70% ethanol were stained with propidium iodide and rat anti-BrdU-FITC 
(1:100, Bio-Rad MCA2060FT). Certified mycoplasma free CH12-F3 were obtained 
by Cell Services (Francis Crick Institute). These and other cell lines (for example, 
293T, BOSC-23) were confirmed free of mycoplasma contamination. 
Antibodies. Immunoblot primary antibodies used in this study; anti-Rifl: Clone 
SK1316; gift of Ian Adams”; anti-histone H3: Abcam ab10799, clone 10799; 
1:2,000; anti-Rev7: BD 612266, clone 14/MAD2B/Rev7, 1:500; anti-53BP1: Novus 
Biologicals NB100-304, 1:2,500; anti-HA-11: BioLegend 901501, Clone 16B12, 
1:2,000; anti-actin: Sigma A1978, Clone AC15, 1:2,000; anti-tubulin: Sigma 
00020911, Clone TAT-1, 1:10,000. Proteins were detected using HRP-conjugated 
secondary antibodies («-rabbit, Thermo Fisher 31462, 1:50,000; anti-mouse, 
Thermo Fisher 31432, 1:50,000; Mouse TrueBlot ULTRA, Rockland 18-8817- 
33, 1:5000) and enhanced chemiluminescence (Clarity, Bio-Rad). Signals were 
acquired digitally on a Gel Doc XR system (Bio-Rad). 

Proteomics and mass spectrometry. Pellets collected from cultures of ~4 x 107 
CH12-F3 cells were lysed in BLB (Benzonase Lysis buffer: 20 mM HEPES pH 
7.9, 40 mM KCl, 2 mM MgCh, 10% glycerol, 0.5% NP40, 50 U/ml Benzonase 
(Novagen), 0.05% (v/v) phosphatase inhibitors (Sigma-Aldrich) and protease 
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inhibitors (Complete EDTA-free, Roche)) and were incubated on ice for 30 min 
before a second incubation with adjusted salt (450 mM KCl). Flag-REV7 or control 
complexes were isolated from clarified lysates, following their dilution in NSB 
(no-salt buffer: 20 mM HEPES (pH 7.9), 10% glycerol, 0.5 mM DTT, 0.5 mM 
EDTA, 0.05% (v/v) phosphatase inhibitors (Sigma-Aldrich) and protease inhibitors 
(Roche)) to a final salt concentration of 125 mM. Flag~-HA-REV7 complexes were 
immunopurified on anti-Flag (M2) magnetic resin (Sigma-Aldrich), washed exten- 
sively in wash buffer (BLB supplemented with 125 mM KCl and 0.1% NP-40) and 
eluted with 3 x Flag peptide (Sigma-Aldrich). Flag-peptide eluted complexes were 
reduced and alkylated using DTT and iodoacetamide, followed by a chloroform/ 
methanol precipitation. Proteins were resuspended in 6 M urea and digested over 
night with trypsin (Promega). Peptides were desalted (Sola, Thermo) and ana- 
lysed on a LC-MS platform consisting of a Dionex Ultimate 3000 nHPLC and 
Q-Exactive mass spectrometer. Peptides were separated on an EASY-Spray col- 
umn (50 cm, ES803, Thermo) with a gradient of 3-35% acetonitrile in 5% DMSO 
and 0.1% formic acid at 250 nl/min. MS1 spectra were acquired at a resolution 
of 70,000 at 200 m/z. Up to the 15 most abundant precursor ions were selected 
for subsequent MS/MS analysis after ion isolation with a mass window of 1.6 Th. 
Peptides were fragmented by HCD with 28% collision energy. Progenesis QI (v.3, 
Waters) was used for spectral counting and LFQ of the LC-MS/MS data with 
default parameters (top 3 quantitation mode). Proteins were identified with PEAKS 
8.0 (Bioinformatics Solutions) using standard parameters and the Uniprot mouse 
reviewed proteome (retrieved 28 November 2017). Peptide false discovery rate 
(FDR) was set to 1% with a resulting protein FDR of 1.88%. 

Chromatin immunoprecipitation. Each ChIP was performed from chromatin 
prepared from ~10’ CH12-F3 cells stimulated for 30 h with agonist CD40 antibody, 
IL-4, and TGF81 as previously described’. Each ChIP was performed using 30-50 
pg CH12-F3 chromatin using RPA34-20 (3 j1g; Ab-3, Calbiochem) or anti-histone 
H3 (2 jg; ab1791, Abcam) coupled to 25 j1l protein G Dynabeads (Life Technologies, 
10003D). Relative quantities of ChIP-enriched DNA were calculated relative to total 
input chromatin by qPCR in triplicate on CFX96 Real-Time Analyzer (Bio-Rad) 
or StepOnePlus (Applied Biosystems) instruments using Quantifast SYBR Green 
reagent (Qiagen) and locus-specific primer pairs (Extended Data Table 4a). 

Yeast two-hybrid interaction. Open reading frames encoding the indicated pro- 
teins were cloned into pAD-DEST and pBD-DEST vectors and transformed into 
Saccharomyces cerevisiae (strain PJ69-4A). Ten-millilitre cultures prepared from 
single transformants were diluted to equal volumes containing 2 x 107 cells, and 
fivefold serial dilutions ‘spotted’ on control (—Leu, —Trp) or experimental (—Leu, 
—Trp, —His) plates supplemented with 3-amino-1,2,4-triazole (3-AT, 6 mM, Sigma 
A8056). Plates were incubated at 30°C for 3 days. 

PARPi resistance. KB1P-G3 and 53bp1~'~ KB1P-G3 cells were infected with viral 
supernatant generated using lentiCRISPR-Bsr. Following selection in blasticidin 
(10 mg/ml), genomic DNA (gDNA) samples were collected immediately before 
seeding for olaparib sensitivity. Blasticidin-resistant populations were seeded in 
6-well plates at a density of 104 cells per well (5 x 103 for 53bp1~/~ KB1P-G3) 
in the presence of olaparib or DMSO and grown at 37°C (5% CO; and 3% On). 
Medium was refreshed at 4 days and 8 days. After 10 days, cultures were expanded 
in fresh medium for 1 week in 6-cm dishes before collecting gDNA. PCR amplicons 
encompassing the edited locus were PCR amplified from gDNA, Sanger sequenced 
(GATC Biotech) and analysed by tracking of indels by decomposition (https://tide. 
deskgen.com/). Surviving cells were collected and three replicates were plated in 
DMSO and three in olaparib for viability analysis. DMSO and olaparib-treated 
cells were stained with crystal violet (0.5% (w/v) crystal violet in 25% methanol) 


LETTER 


after 8 days and 10 days growth (37°C, 5% CO; and 3% O ), respectively. Crystal 
violet stained cells were dissolved in a 10% (v/v) acetic acid solution a minimum of 
24h after staining and the ODs95 was measured as a quantitative metric of relative 
growth. In Fig. 4g, cells were incubated for 24 h in olaparib 250 nM in standard 
medium, before metaphase chromosomes were collected. 

Cytological analysis. Metaphase spreads were prepared by standard methods. In 
brief, detached cells were resuspended in KC] 75 mM for 20 min, before being fixed 
in Carnoy’s fixative. Approximately 20 11 of cell suspension was dropped onto clean 
slides and left to dry overnight. The cells were then stained with propidium iodide 
0.5,:g/ml in PBS for 20 min, rinsed and the slides mounted in Vectashield/DAPI 
(Vector Labs). The slides were analysed blind, and a minimum of 50 metaphases 
were acquired using an Olympus BX60 microscope for epifluorescence equipped 
with a Sensys CCD camera (Photometrics). Images were collected using Genus 
Cytovision software (Leica). 

Fluorescence in situ hybridization analysis. BAC probes (RP23-41J14 (Igh 3’); 
RP24-316H6 (Igh 5’) (gifts from the Welcome Sanger Institute, Hinxton); RP23- 
374P12 (Igl 3’); RP23-382P9 (Igl 5’) (Source Bioscience)) were labelled using a 
nick translation kit (Abbott Molecular) according to the manufacturer's instruc- 
tions, incorporating either Chromatide Alexa Fluor 594-5-dUTP (Thermo Fisher 
Scientific), Chromatide Alexa Fluor 488-5-dUTP (Thermo Fisher Scientific), 
Gold-dUTP (Abbott Molecular) or biotin-16-dUTP (Sigma). The probes were 
resuspended in the presence of a ten-times excess of unlabelled mouse COtl DNA 
(Thermo Fisher Scientific), in hybridization buffer (50% formamide, 10% dex- 
tran sulphate, 2x SSC), before being denatured for 8 min at 85°C, followed by 
a pre-annealing step 30 min at 37°C. The metaphase spreads were denatured in 
0.07 N NaOH for 1 min. The probes were applied onto the slides, and the hybrid- 
ization was carried out overnight at 37°C. Three post-hybridization washes were 
performed, in 0.1x SSC buffer at 65°C. Biotinylated probes were detected using 
streptavidin-Cy5 (Thermo Fisher Scientific). Slides were mounted in Vectashield/ 
DAPI and analysed blind with the microscope described above. Between 98 and 
150 metaphases were analysed for each mouse with the exception of one case (45 
metaphases collected for one control mouse). 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability statement. The mass spectrometry proteomics data have been 
deposited to the ProteomeXchange Consortium via the PRIDE partner repository, 
with the dataset identifier PXD009650 (https://doi.org/10.6019/PXD009650). 


29. Ward, |.M., Minn, K., van Deursen, J. & Chen, J. p53 binding protein 53BP1 is 
required for DNA damage responses and tumor suppression in mice. Mol. Cell. 
Biol. 23, 2556-2563 (2003). 

30. Adams, |. R. & McLaren, A. Identification and characterisation of mRifl: a 
mouse telomere-associated protein highly expressed in germ cells and 
embryo-derived pluripotent stem cells. Dev. Dyn. 229, 733-744 (2004). 

31. Skarnes, W. C. et al. A conditional knockout resource for the genome-wide study 
of mouse gene function. Nature 474, 337-342 (2011). 

32. Khalaj, M. et al. A missense mutation in Rev7 disrupts formation of Polc, 
impairing mouse development and repair of genotoxic agent-induced DNA 
lesions. J. Biol. Chem. 289, 3811-3824 (2014). 

33. Kikuchi, S., Hara, K., Shimizu, T., Sato, M. & Hashimoto, H. Structural basis of 
recruitment of DNA polymerase ¢ by interaction between REV1 and REV7 
proteins. J. Biol. Chem. 287, 33847-33852 (2012). 

34. Wagner, S.A. et al. A proteome-wide, quantitative survey of in vivo ubiquitylation 
sites reveals widespread regulatory roles. Mol. Cell. Proteomics 10, 
M111.013284 (2011). 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


a 2 2» 2 2» 
8 $§ g S$ sg § ¢ g § 
™ FE > rr tr Rev7FF Mb1** Rev7- Mb1*/re = a < 
822 8 2 2 ~ aa 
SSS 8 SSS 8 88 8 38 
Sie er os +k x = = = 
7 & & zy g& & ral HO < < & x 
Sees fees se Se is & i oS 
e222 e222 Brte Ftee =a 8 3 
Cond (475 bp) : Cond (475 bp) 
_— =— Rev7 (~25 kDa) 
WT (314 bp) Flox (255 bp) 
Flox (255 bp) 7 
one marrow — = = <2 : 
Earpunch Splenic B cells Splenic B cells 
c aCD40+ IL-4 (96 h) LPS (96 h) LPS (96 h) 
Rev7** Mb 1*/cre Rev7*" Mb1+/Cre Rev7* Mb 1+/cre Rev7"F Mb1*/cre Rev7** Mb1*/<re Rev7™™ Mb17“e 
“y 32.7%] 4 3.18% i 4.01% |""4 0.91% 6.54% |"4 1.32% 
iia | ho® 1 10° “y Lae! 
ey i Ia O i ea 9 we 
O° le Lys ° & , Es 
ae 4 3 
g | | g ‘ | a ‘| 
oo ge ge wary ne mae ae?) a elie 3 ; 
| pee ai See oy | ai 
Extended Data Fig. 1 | CSR characterization in Rev7 conditional- Representative data; n > 3 experiments. b, Western blot analysis of REV7 
knockout mice. a, PCR amplicons from genomic DNA obtained by protein expression in splenic B cells isolated from mice with the indicated 
ear biopsies or purified splenic B cells (left), or flow-cytometry-sorted genotype. Representative data; n = 2 experiments. For gel source data, see 
cells from Hardy fractions A, B and C (right) from mice of the indicated Supplementary Fig. 1. c, CTV-labelled purified B cells were stimulated 
genotype. Bands of different size correspond to the Rev7!" allele (cond, as indicated and stained for surface IgG1 (left) or IgG2b (centre) or IgG3 


475 bp), Rev7* allele (wild type, 314 bp) and Rev7"""“ allele (flox, 255 bp). (right) on day 4. Representative of n > 6 experiments. 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


a 
Rev7 +/+ Mb1 +/Cre Rev7"* Mb1 +/Cre 53bp1 + Mb1 +/Cre Rev7 +/+ Mb1 +/Cre Rev7 F/F Mb1 +/Cre 53bp1 + Mb1 +/Cre 
1° | B: 30.9% wy B: 27.6% "9B: 11.6% WIE 13.1% F: 15.3% WIE: 11.9% F: 19.9% IE: 7.2% F: 38.9% 
8" 3 wt 
Ss C: 6.93% o* 
M4 wy i = 
x rm 
nN 
aod of 8 3 
O 3A: 56.3% A: 78.2% 7 


Peery T, v, errreragyerrerny 7 v Tre TET naar 
a0 0 10 10 10 a0 0108 sot 1 a8 0 10 10 10 


IgD-APC-Cy7: 


zi 
8 


© Control (Mb1°") a Rev7#" Mbt/ce 


' = 53bpt” Mbt/ere 
p=0.04 


N 
oO 


< 
| 
e 
> 
a 
ive cells 


o 
I 
—! 
> 
» 
{@ 
% Annexin V positi 
oS 
{=) 
Absolute cell number 
cS) 
. 


i] 
. = . p<0.01 


2 
ae? 


% Annexin V positive cells 
> 
2 


oO 


-B- a Total B220+ ~=+Follicular Marginal Zone 
ApBIBeSte ERE RE hr 2 Total B220+ Follicular. Marginal Zone 


© CD45.1 (wild type donor) 
d e © CD45.2 (Rev7** Mb1*°?/53bp 17 Mb 1+") 
CD45.1 


DB | 


wild-type donor 


Rev7" Mb1** Rev7"F Mb 1+~ 
B220+CD43+ (6.06%) B220+CD43+ (6.94%) 


= 
oO 
~N 


prepare 1:1 
bone marrow 
mixture 


cD45.2 | 


experimental donor 


= 
oO 
[o> 


CD43-FITC —a> 


Absolute cell numbers 
= 
Le j 
on 


oO : 1 : To 3 : 4 o 1 x 3 4 104 
B220-APC ————__—_ = pata ABCDEF ABCDEF 
ene eee, ee 
goat mare tetven WT: Rev?" bie = WT: S3BP1* Mbt 
Rev7"* wild-type recipient CD45.1 (wild type donor) 
cape cuted Mb1“* Mb1+~ BB cp45.2 (Rev7** Mb1*/53bp 1" Mb1**) 


pro-B cells ee ee 53BP 1 7-8 weeks 7 
(B220+ CD43+; 250 


100 
Hardy Fr. A-C) 


measure the contribution 


REV7 of CD45.1 and/or CD45.2 
to bone marrow engraftment 


80 


60 


30 . 
P 4 3 : 2a 
25 lonceau ‘ 
S stain pe CD45.1 | 
—— CD45.1-PE-Cy7 >= WT: Rev7** Mp1" WT: 53BP1* Mb1""e 


9 
1) 
RK 
a 
LS) 


% CD45.1 : CD45.2 
8 8 


fo) 


CD45.2-APC a= 


47.5% | INPABCDEF INPABCDEF 
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Extended Data Fig. 2 | B cell lineage developmental differences in 

Rev7- and 53bp1-deficient mice. a, Flow cytometry analysis of B cell 
development in the bone marrow of Rev7+!+ Mb1*!", Rev7/IMb1+!"" and 
53bp1-'~Mb1*! mice; gating on B220*CD43* (left, Hardy fractions 

A, B and C) and on B220*CD43° (right, Hardy fractions D, E and F). 
Representative data; n > 8 experiments. b, Apoptotic indices of total, pre- 
pro-B to pre-B, immature and mature B cell fractions in the bone marrow 
(left) and spleen (right) of Rev7+/* Mb1*!*"* (n= 4), Rev7/Mb1*!" (n=4), 
53bp1-'~Mb1*!? (n=2) mice. c, Flow cytometric sub-classification of 
mature splenic B cell fractions in Rev7+/+ Mb1*!? (n=8), Rev#4Mb1+!'" 
(n=8), 53bp1~'" Mb1*!"* (n=9) mice. P values, unpaired Student’s 
two-tailed t-test. Bars represent mean + 95% confidence interval. d, Top, 
indicated populations of pooled pro- to pre-B cell stage bone marrow 
lymphocytes (B220*CD43"; Hardy fraction A, B and C) from n=2 mice 
per genotype were FACS-sorted and used to generate whole cell extracts. 
Bottom, immunoblot shows an absence of REV7 protein in extracts 
prepared from Rev7/Mb1*'" experimental bone marrow, when compared 
to extracts prepared from Rev7/Mb1*'+ (no Cre) controls, yet equivalent 


levels of 53BP1, histone H3 (loading control) and total protein (Ponceau S 
stain). For gel source data, see Supplementary Fig. 1. e, Left, diagram of the 
mixed bone marrow chimaera transplantation experiment. Bone marrow 
cells from a wild-type CD45.1* donor mouse were combined with an 
equal number of bone marrow cells from an experimental CD45.2* donor 
mouse, and injected into lethally irradiated recipient CD45.1* mice (n=8 
per genotype). After eight weeks, the recipient bone marrow was analysed 
for the relative contribution of CD45.1* or CD45.2* cells to reconstitute 
the recipient mice. Right top, enumeration of B cell precursors (as per 

Fig. 1g) of CD45.1* (white circles) or CD45.2* (from Rev7/f{Mb1+/ 

or 53bp1 ~'— Mb1+!*? mice; black circles) cells in the bone marrow after 
reconstitution. Right bottom, stage-specific ratios of CD45.1* to CD45.2+ 
grafted B cells for indicated mixed chimaeras. In parallel, an additional 
control experiment involving wild-type CD45.1 and Mb1*'" mixed 
chimaera was performed, resulting in equal CD45.1:CD45.2 reconstitution 
(data not shown). P values, multiple t-test with Holm-Sidak correction; 
mean + 95% confidence interval. INP, input. 
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binding motifs (RBM, and RBM)). Pairwise alignment of RBM, of REV3L 
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code: 4FJO), pseudo-coloured according to amino acid conservation. 

V, variable residues; C, conserved residues. c, Protein threading model 
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Extended Data Fig. 6 | See next page for caption. 
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Extended Data Fig. 6 | SHLD3 (FLJ26957) mediates 53BP 1-dependent 
NHEJ. a, Immunoblot showing levels of REV7, 53BP1 and RIF1 protein 
in whole cell lysates prepared from indicated CH12-F3 cell lines. 
Representative of n = 2 individual experiments. For gel source data, 

see Supplementary Fig. 1. b, Normal proliferation of stimulated wild 
type and Shld3~/~, Shld1~'~ and Shld2~'~ CH12-F3 cells. CFSE dye 
dilution assay. n = 2 independent experiments. c, Summary of IgM-to- 
IgA CSR frequencies in indicated control GST or HA~-SHLD2(mouse)- 
complemented Shld2~'~ CH12-F3 cells. Data normalized to CSR in 
wild-type cells. n = 4 independent experiments; mean + s.d. NC, non- 
complemented. d, IgM-to-IgA CSR in indicated Shld3~/~CH12-F3 

cells complementation with wild-type SHLD3 or SHLD3(P53A, 
P58A) (that is, RBM, mutated). Data normalized to CSR in wild-type 
cells. Mean + s.d. (n= 4). e, Summary of IgM-to-IgA CSR frequencies in 
Rev7~!~ and Rev7~'~Shld3~'~ double knockout CH12-F3 clones. Data 
normalized to CSR in wild-type cells. n = 4 independent experiments; 
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mean + s.d. f, Histone H3 (control) ChIP efficiencies at indicated Igh 
and control loci. This panel is related to RPA ChIP data shown in Fig. 4d. 
Representative data, n = 2 independent experiments, mean +s.d., 2 or 3 
qPCR replicates. g, Indicated CH12-F3 lines stimulated with anti-CD40 
antibody, IL-4 and TGF(1 (30 h) were subjected to ChIP experiments with 
RPA34 and histone H3 (control) antibodies. Representative data, n =2 
independent experiments, mean +s.d., 2 or 3 qPCR replicates. 

h, Cell cycle of indicated CH12-F3 lines after MMC treatment. 
Proportions (%) of sub-G1, G1, S and G2/M events. n = 3 independent 
experiments; mean + s.d. i, Change (%) of Cas9-dependent indels at 

the indicated sgRNA locus in KB1P-G3 cells after outgrowth in DMSO 
or olaparib (300 nM) for 7 days. Representative data, 2 independent 
experiments. j, As in i but with 53bp1~/~ KB1P-G3 cells. Representative 
data, n = 2 independent experiments. k, As in Fig. 4e but with 53bp1-! = 
KB1P-G3 cells. Representative data, n = 3 independent experiments. 
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Extended Data Table 1 | Predicted and observed offspring of Rev7 mutant mice (Rev7?™12EUCOMM)Wtsi. Wiouse Genome Informatics 
code:4432091) bred in this study on a C57BI/6 background 


Cross: Rev7™ x Rev7” 

Offspring (IMPC*") 136 

Offspring (This study) 25 

Combined offspring 161 

Genotypes Rev7** Rev7” Rev7” 
Predicted offspring 40.25 80.5 40.25 
Observed offspring 52 109 0 


Predicted offspring (corrected for 


homozygous lethality): 53.67 106.26 0 


IMPC, International Mouse Phenotyping Consortium?!. 
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Extended Data Table 2 | Mutant mouse REV7 proteins generated and phenotyped in this study 


Safety belt 


Mouse 
mutant 
alleles 


Rev1- 
binding 
residues 


PTM/ 
Degron 


Conserved 
surface 
residues 


Further information about the REV7 mutants can be found in refs. 


REv7*°4 


REv7" 174 
REv7*o4™1 71A 


REv7"' 86A 
REV702004 


REV7*2024 
REV7"' 86A,Q200A 


REV7" ests 


K44R, K9OR, K162R, 
REV7 
K167R, K190R 


K44R, K46R, K47R, 
REV7 
K72R, K77R, K82R, K9OR, 
K97R, K129R, K162R, K167R, 
K190R, K198R, K208R 


REv7''9'4 
REv7''9'= 


REV7*' 29A 
REV7*' 29R 


REvV7"' 7A 


REv7"' 7R 
REv7="4 


REV7*="' R 
REv7=**4 


REv7=*4" 


REV75'204 


REV75'20R 


Predicted to block REV7- 
REV3L interactions and 
sensitizes DT40 cells to 
Cisplatin 


Mouse missense mutation 
predicted to disrupt DNA 
polymerase C formation; 
sensitizes cells to ICLs. 


Predicted to block REV7- 
REV1 interactions 


D-box mutation blocks 
APC/C-dependent Rev7 
degradation 


Identified REV7 E> caubadeamiaad 


E> caubadeamiaad 


Lysine-less REV7 mutant 


ee | phosphorylation 
ee | 


Conserved Lysine 


Conserved acidic patch 


16-18,32-34 


ree org 


Phosphosite.org 


Identified by 
structure-led 
conservation 
analysis 
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Extended Data Table 3 | CRISPR-Cas9 reagents and edited cell-lines generated in this study 


Targeted to exon 1, AS 


Target gene} Clone 
+1 bp at R34 [-2 bp at R34 


+2 bp atL20 [+2 bp at L20 
C20orf196 / 
-5bpatT57 |-5 bp at T57 
-1 bp atA35 [+32 bp at A35 
Fam35a / 
Shid3 -25 bp at 132 [-19 bp at N34 
-5 bp at R273 |-20 bp at G270 
-1 bp at W46 [-1 bp at W46 
FIj26957 / -2bp at F42 |-7 bp at 144 
Shid3 -2bp at Y49 _ |-2 bp at Y49 
-4bp at Y49 |-13 bp at K53 
FIj26957 in -23 bp at F38 |-23 bp at F38 
Rev7" (C) -22 bp at 38 |-22 bp at F38 
KB1P-G3 F1j26957 -1 bp at wW46_[-8 bp at W46 
(Breat” p53”) fs |[-1 bp at W46_]-5 bp at W46 


a, Sequences of sgRNA used in gene editing experiments. S indicates sgRNA corresponds to sense strand; AS indicates antisense strand. b, Individual edited alleles in each cell-line clone as confirmed 
by Sanger sequencing. 
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Extended Data Table 4 | Primers used in this study 


a 
Target/gene locus Primer Sequence (5’-3’) 
TCCAGTGTGCAAGAAAGCTAAAT: 

Rps0 CCAGTGTGCAAGAAAGC G 
GGCAGTGCGTGGAGACTCA 
F AATGTGGTTTAATGAATTTGAAGTTGCCA 

A(Igh Sy) Fwd = JC GTGG G GAAGTTG 
TCTCACACTCACCTTGGATCTAAGCACTGT 


GCTAAACTGAGGTGATTACTCTGAGGTAAG 
B (Igh Sy) 


GTTTAGCTTAGCGGCCCAGCTCATTCCAGT 


AGTGTGGGAACCCAGTCAAA 
C (Igh Sy1) 


GTACTCTCACCGGGATCAGC 


TGAAAAGACTTTGGATGAAATGTGAACCAA 
D (igh Sa) 


GATACTAGGTTGCATGGCTCCATTCACACA 


Rev |TGGCAGCAGAAAGAGAAGGG 


Shlid1 
[Fwd d JAGTAGCTGCTCTTTTGGCGT 
: 


Rev |TGGCAGCAGAAAGAGAAGGG 


AGCCCACACATTTGTCCACT 


C20orf196 / 


AGTAGCTGCTCTTTTGGCGT 


Rev |{GCTCCAGTTGCTCCACTGAA 


Fam35a / TCCTTGGCTTCTTGGACACC 


Shid2 Rev |ATGGAGGCAGAACCAACAGG 


TGACCTTGAGCCTGTTCCAC 
pe Rev |TGGAGTTGGAGCAGTTGCAG 
TACTGCTTCACGCTCTCAGC 
GCAGGCTGTCCCTACCAAAT 
TACTGCTTCACGCTCTCAGC 
GCAGGCTGTCCCTACCAAAT 


a, Locus-specific ChIP amplicon qPCR primer sequences. b, Primers used to amplify CRISPR-Cas9 target loci in edited cell lines. 
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Structures of human Patched and its complex with 
native palmitoylated sonic hedgehog 


Xiaofeng Qi!°, Philip Schmiege!, Elias Coutavas’, Jiawei Wang** & Xiaochun Li 


Hedgehog (HH) signalling governs embryogenesis and adult tissue 
homeostasis in mammals and other multicellular organisms!-3. 
Whereas deficient HH signalling leads to birth defects, 
unrestrained HH signalling is implicated in human cancers***, 
N-terminally palmitoylated HH releases the repression of Patched 
to the oncoprotein smoothened (SMO); however, the mechanism 
by which HH recognizes Patched is unclear. Here we report cryo- 
electron microscopy structures of human patched 1 (PTCH1) 
alone and in complex with the N-terminal domain of ‘native’ sonic 
hedgehog (native SHH-N has both a C-terminal cholesterol and an 
N-terminal fatty-acid modification), at resolutions of 3.5 Aand 
3.8 A, respectively. The structure of PTCH1 has internal two-fold 
pseudosymmetry in the transmembrane core, which features a 
sterol-sensing domain and two homologous extracellular domains, 
resembling the architecture of Niemann-Pick C1 (NPC1) protein’. 
The palmitoylated N terminus of SHH-N inserts into a cavity 
between the extracellular domains of PTCH1 and dominates the 
PTCH1-SHH-N interface, which is distinct from that reported 
for SHH-N co-receptors®. Our biochemical assays show that 
SHH-N may use another interface, one that is required for its co- 
receptor binding, to recruit PTCH1 in the absence of a covalently 
attached palmitate. Our work provides atomic insights into the 
recognition of the N-terminal domain of HH (HH-N) by PTCH1, 
offers a structural basis for cooperative binding of HH-N to various 
receptors and serves as a molecular framework for HH signalling 
and its malfunction in disease. 

The HH precursor undergoes autocatalytic processing in the endo- 
plasmic reticulum to release an amino-terminal signalling domain 
(HH-N) with cholesterol covalently coupled to its carboxyl terminus. 
Hedgehog acyltransferase then adds palmitate to the a-amino group of 
the N-terminal-specific cysteine to yield the mature, doubly lipidated 
signalling molecule”!°. N-terminal palmitoylation is indispensable 
for HH signalling: (1) fatty-acylated SHH-N is far more active than 
unacylated SHH-N, as determined from differentiation assays and HH 
signalling assays’»!*; (2) blocking HH-N palmitoylation (by mutation 
of its palmitoylation site) was shown to affect embryonic development 
in Drosophila and in mice!®’3; and (3) inhibitors of hedgehog acyltrans- 
ferase that prevent the palmitoylation of SHH block HH signalling". 

Human patched 1 (PTCH1), the primary receptor for HH-N 
ligands, consists of 1,447 amino acids, including 12 transmembrane 
helices and three approximately 30-kD soluble domains, namely two 
extracellular domains (ECD-I and ECD-II) that bind HH-N and one 
cytoplasmic carboxyl-terminal domain (CTD) (Fig. la, Extended 
Data Fig. 1). In addition, transmembrane helices 2-6 (TM2-TM6) of 
Patched are predicted to form a sterol-sensing domain (SSD), which, in 
other proteins such as NPC1 and HMG-CoA reductase, is involved in 
cholesterol metabolism and signalling’. Unliganded Patched inhibits HH 
signalling and this repression is released when HH binds to Patched". 
Specifically, after HH binding, Patched releases its inhibition of SMO, 
a polytopic membrane receptor that activates the Gli transcription 
factors to upregulate HH target genes”. How Patched inhibits SMO is 


14% 


unknown, but there are studies that show that Patched may act indi- 
rectly by releasing a small molecule to regulate SMO'”"®. In support of 
this model, Patched has a similar transmembrane topology to NPC1 and 
prokaryotic resistance-nodulation-cell division (RND) transporters, 
which transport ligands across membranes”””. 

Two major gaps remain in our knowledge of the HH pathway: (1) the 
molecular details of how HH recognizes and binds Patched; and (2) the 
mechanism of SMO activation after HH binds Patched. More impor- 
tantly, Patched is a tumour suppressor involved in basal cell carcinoma, 
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Fig. 1 | The engineered human PTCH1 protein binds a SHH-N ligand. 
a, Primary structure of PTCH1. Residues 619-720 and 1189-1447 were 
removed in PTCH1*. b, HH signalling in Ptch1~/~ mouse embryonic 
fibroblasts (MEFs) transfected with pcDNA3.1, full-length PTCH1 
(FL-PTCH1) or PTCH1* and response to wild-type or C24S mutant 
SHH-N ligand via luciferase activity. SHH-N in conditioned medium and 
transiently expressed PTCH1 were detected by western blotting. Calnexin 
served as an internal control and was detected by anti-calnexin antibody. 
SHH-N or SHH-N C24S was added to stimulate HH signalling. c, Pull- 
down assay of N-His-tagged SHH-N (left) or native SHH-N (right) with 
PTCH1* at different molar ratios detected by Coomassie staining. The 
assay was reproduced three times with similar results. d, Palmitoylated 
SHH-N stimulates HH signalling, but unmodified SHH-N does not. SHH 
Light II cells were treated with various concentrations of SHH-N variants, 
and HH signalling was measured using luciferase activity. Data (b and d) 
are mean +s.d. (n=3). 
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Cytosol to TM7 
MEPTCH1* Ecp.14 


— NPC1 


to TM2 


Fig. 2 | Overall structure of PTCH1*. a, Ribbon representation of the 
structure horizontal to the membrane. Flexible linkers are indicated by 
dots. b, Structural comparison of transmembrane domains of PTCH1* 
(blue and green) and NPC1 (brown; PDB code 5U74) viewed from the 
extracellular side. c, Structural comparison of transmembrane domains 
of PTCH1* (blue and green) and an AcrB trimer (PDB code 1IWG). One 


medulloblastoma and primitive neuroectodermal tumours’, and 
mutations of SHH and PTCH1 can also cause developmental defects’. 
SMO is a target of antitumour agents”’. Because this pathway is asso- 
ciated with human diseases, structural knowledge of Patched and the 
Patched—HH complex is crucial not only for elucidating the mechanism 
of signal transduction, but also for understanding the pathology of 
mutants and for the development of potential therapeutics for human 
diseases. 

The full-length human PTCH1, expressed with a C-terminal Flag 
tag in human embryonic kidney (HEK)-2935S cells, is eluted in the 
void volume during gel filtration (Extended Data Fig. 2a). To make the 
protein amenable to structural studies, we truncated the cytoplasmic 
loop between TM6 and TM7 and the CTD of PTCH1. A recent study 
on PTCH1 revealed that simultaneous deletion of both its TM6-TM7 
internal loop and its cytoplasmic domain did not affect PTCH1- 
dependent repression of SMO activity in PTCH1-deficient MEFs or 
normal localization in cilia”’. This suggests a structural or mechanistic 
interaction between the TM6-TM7 internal loop and the CTD, 
because deletion of the CTD in combination with this loop restores 
normal activity. This PTCH1 variant (PTCH1*) has better solubility 
(Extended Data Fig. 2b). To test the function of PTCH1*, either 
PTCH1* or full-length PTCH1 was transfected to PTCH1-deficient 
MEFs. HH reporter assays reveal that, similar to wild-type PTCH1, 
PTCH1* can repress HH signalling. They also reveal that treatment 
with conditioned medium containing palmitoylated SHH-N without 
the cholesterol modification, but not the C24S palmitoylation-site 
SHH-N mutant, can release this repression (Fig. 1b). 

We assembled the PTCH1*-SHH-N complex using unmodified 
SHH-N with an N-terminal His tag purified from Escherichia coli, or 
SHH-N with a C-terminal cholesterol and an N-terminal fatty-acid 
modification purified from HEK-293 cells (termed ‘native’ SHH-N 
hereafter) (Fig. 1c). The native SHH-N, but not unmodified SHH-N, 
formed a stable complex with PTCH1*, with SHH-N being detected 
at a 1:1 ratio (Fig. 1c). We also measured HH signalling activity in cells 
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MECD11 [)ECD-12 MHECD-I11 [jECD-2 
subunit of AcrB is yellow; the rest are grey. d, SSD comparison of PTCH1* 
(blue) and NPC1 (brown; PDB code 5U74) in a similar view to the right 
panel of a. Red arrows indicate shifted helices. A surface representation 
of the unidentified molecule is shown in yellow. e, Overall structure of 
ECD-I and ECD-II. f, Interface between ECD-I and ECD-II. Hydrophilic 
interactions are indicated by dots and residues are coloured as in e. 


by adding either conditioned media or purified SHH-N proteins to 
SHH Light II cells that carry a Gli reporter plasmid. The results show 
that palmitoylated SHH-N, but not the C24S SHH-N mutant or N- 
His-tagged SHH-N, can stimulate HH signalling considerably (Fig. 1d). 
This suggests that PTCH1* is able to bind the native SHH-N, allowing 
us to make a physiological complex with purified PTCH1* in vitro 
(Extended Data Fig. 2c). 

We determined the structure of PTCH1* to 3.5 A resolution (Fig. 2a, 
Extended Data Figs. 3, 4, Extended Data Table 1). PICH1*, a monomer 
in solution, measures 110 A x 60 A x 40 A (Fig. 2a). The structure 
exhibits pseudosymmetry across the 12 transmembrane helices and 
features two homologous extracellular domains (Fig. 2a). The trans- 
membrane domain of Patched has a similar topology to NPC1 and to 
prokaryotic RND transporters”!”, one of which is AcrB (Fig. 2b, c). 
Previous studies have suggested that PTCH1 could form oligomers 
mediated by its CTD’; because this part of the molecule has been 
removed in our construct, we focus our discussion on monomeric 
PTCH1*. 

Previous crosslinking studies have suggested that the SSD of NPC1 
may bind a small ligand??. We have shown previously that a cavity 
in the SSD of NPC1 is large enough to accommodate a cholesterol 
molecule’. A corresponding pocket is observed in the SSD of PTCH1* 
(Fig. 2d, Extended Data Fig. 5). This hydrophobic pocket opens to 
the extracellular space and plasma membrane and measures roughly 
20 A x 10 A x 10 A (Fig. 2d). Remarkably, we observed a rod-shaped 
density in this pocket (Fig. 2d), which is distinct from detergent 
micelles and other noise on the basis of its local resolution. We specu- 
late that the density might derive from an endogenous sterol derivative 
or another lipid. Structural comparison of PTCH1* and NPC1 reveals 
that the transmembrane helices of the N-terminal half of PTCH1* 
converge more closely than those in NPC1, potentially owing to an 
interaction with this unidentified ligand (Fig. 2d). 

ECD-I and ECD-II overlap with each other with a root-mean-square 
deviation (r.m.s.d.) of 3.8 A (Ca atoms) and resemble the middle and 
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Fig. 3 | Structure of PTCH1*-SHH-N complex. a, Ribbon representation 
of the structure horizontal to the membrane with PTCH1*, coloured as 

in Fig. 2 and with SHH-N coloured in cyan. An SHH-N-bound zinc atom 
is indicated by a grey sphere. Putative endogenous molecules from the 
cryo-EM density map are shown at the 5a level as a red mesh. b, Structural 
comparison of the membrane domains of apo-PTCH1* (grey) and 


C-terminal lumenal domains of NPC1*. Each ECD consists of two 
subdomains: subdomain 1 ranges from the cell membrane to the middle 
of each ECD, with three 8 strands and two a helices providing the 


(i apo-PTCH1* [jj SHH-N [Ml apo-PTCH1* j™jSHH-N 


PTCH1*-SHH-N (coloured). c, The palmitate-binding site. ‘Np’ denotes 
the N-terminal peptide of SHH-N (residues 24-38). d, Interface between 
the Np of SHH-N and ECD-I compared with apo-PTCH1*. e, Secondary 
interface between SHH-N and ECD-I subdomain 2 compared with 
apo-PTCH1*. Red arrows represent structural shifts. 


main interface between the two extracellular domains; each sub- 
domain 2 ranges from the middle to the top of each ECD (Fig. 2e). 
There are five residues of the swapped ECD-I a! helix that interact with 
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Fig. 4 | The palmitoylated N terminus of SHH-N dominates its interface 

to PTCH1*. a, The palmitoylated N terminus of SHH-N is important for 
PTCH1* binding. A 2:1 PTCH1*:SHH-N molar ratio was used for assays. 

b, Calcium facilitates the binding between PTCH1 and N-terminal His-tagged 
SHH-N. c, Complex of SHH-N and 5E1 (PDB code 3MXW)), with interaction 
areas on SHH-N shown in red. Calcium atoms in the interface are indicated 
by green balls. d, 5E1 blocks the binding of PTCH1* to His-tagged SHH-N 
(right) but not to native SHH-N (left). e, Mutagenesis of the secondary 
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interface of SHH-N. f, Mutagenesis of the interface of PTCH1*. The assays 
(a, b, d-f) were reproduced three times with similar results. g, Repression 

of HH signalling by PTCH1 with AAAA mutations (PTCH1-AAAA) and 
its response to a SHH-N ligand. SHH-N in conditioned medium is shown 

in Fig. 1d. HH activity was measured using a luciferase assay and data are 
mean +s.d. (n=3). The protein was detected by Coomassie staining (a, b, d) 
or by western blotting (e-g). WT, wild-type. 
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six residues in ECD-II in addition to several hydrophobic interactions. 
By contrast, ECD-II «1 (labelled as «1*) forms only one hydrophilic 
bond with T426 in a4 of ECD-I (Fig. 2f). Moreover, ECD-I contains 
70 amino acids more than does ECD-II, and subdomain 2 of ECD-I 
contains more loops than does subdomain 2 of ECD-II (Extended Data 
Fig. 1). Together, these features confer a flexible character to ECD-I for 
ligand binding. 

The structure of the PTCH1*-SHH-N complex was determined 
to 3.8 A resolution (Fig. 3a, Extended Data Figs. 6, 7, Extended Data 
Table 1). The C terminus of SHH-N (residues 187-197), which was 
also invisible in the SHH-N crystal structure (Protein Data Bank 
(PDB) code 3M1N), could not be resolved. This result supports our cell 
biological assays that cholesterol modification of HH-N is not necessary 
to stimulate the HH signal (Fig. 1b, d). In addition to the molecule 
observed in the SSD, there is another endogenous density close to 
TM12 (Fig. 3a). This helix is slightly more tilted than in the apo- 
structure, potentially using the guanidine group of Arg1150 to bind the 
polar head of this putative lipid (Fig. 3b). There is no substantial confor- 
mational change between the transmembrane regions of PTCH1* alone 
and the PTCH1*-SHH-N complex (Fig. 3b). This suggests that SHH-N 
binding may not abolish SSD-mediated substrate binding of PTCH1. 

Native SHH-N engages two binding sites on the ECD-I of PTCH1* 
(Fig. 3a). The primary interface involves the N-terminal peptide of 
SHH-N (residues 24-38), with fatty-acid modification that fits into 
the space between subdomains 1 of ECD-I and ECD-II (Fig. 3c). This 
binding site includes a strong stretch of density that extends beyond 
the N-terminal Cys24 and that is contiguous with the protein. Mass 
spectrometry has identified this extension on the SHH-N that we used 
as primarily palmitoylation and to a smaller degree lauryolation and 
myristoylation, consistent with previous observations”. On the basis 
of this result and the shape of the density, we assign the N-terminal 
density as a palmitoyl moiety (Fig. 3c, Extended Data Fig. 7e). 
Compared with the apo-PTCH1* structure, the a3 and a3’ helices 
and their connecting loop in ECD-I are shifted towards the membrane 
side, providing more space for this insertion (Fig. 3d). The loop that 
connects a1 and 81 of ECD-I (residues 148-153) undergoes a confor- 
mational change that allows palmitate to expand the space between 
al of ECD-I and a4 of ECD-II (Fig. 3d). Hydrophobic residues from 
extracellular domains form extensive van der Waals interactions 
with this modification (Fig. 3c). The secondary binding site involves 
o helices «1 and a2 of SHH-N, whereas PTCH1* engages its ECD-I 
(Fig. 3e). A recent study showed that a short palmitoylated N-terminal 
fragment (residues 24-45) of SHH-N could partially activate HH sig- 
nalling by binding PTCH1!!. Our work shows that native SHH-N 
indeed forms a more stable complex with PTCH1* than does unpal- 
mitoylated SHH-N (Fig. 1c) and our structure confirms the interaction 
between the SHH-N palmitate and PTCH1* (Fig. 3c). 

The interface that we observed between the native SHH-N and 
PTCH1* was inconsistent with a previously reported interface of 
SHH-N and PTCH1* that includes the zinc-binding site (Fig. 3a), 
which is also able to accommodate calcium and putatively binds to 
PTCH1?!6?7. To resolve this contradiction, we purified N-His-tagged 
SHH-N without the palmitate from E. coli as previously reported”®; 
however, our cell-based HH reporter assays show that the N-His-tagged 
SHH-N lost almost all HH signalling activity (Fig. 1d). Compared with 
native SHH-N, removing the palmitate modification of SHH-N or 
deleting the N terminus (residues 24-36) weakened SHH-N binding 
to PTCH1* (Fig. 4a). Interestingly, binding between N-His-tagged 
SHH-N and PTCH1* could be enhanced by Ca?*+; however, the binding 
between native SHH-N and PTCH1* was not affected by Ca” 
(Fig. 4b). 

We performed further SHH-N competition assays with PTCH1* 
and 5E1, a monoclonal anti-SHH-N antibody of nanomolar binding 
affinity that is used to block HH signalling by binding PTCH1® 
(Fig. 4c). Our structural analysis predicts that 5E1 should not interfere 
with the palmitate-dominated interface to PTCH1*. To validate this 
point, we performed a pull-down assay in the presence of 5E1 to 
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signalling 


Fig. 5 | Putative multivalent complex of SHH-N with PTCH1 and its 
co-receptors. a, b, Complex of SHH-N and CdoFn3 (PDB code 3D1M) 
(a) and of SHH-N and IhogFn3 (PDB code 2IBG) (b), with interaction 
areas on SHH-N shown in red. Hypothetical models of the PTCH1*- 
SHH-N-Cdo complex or PTCH1*-SHH-N-Ihog complex (bottom 
panels) were generated by docking SHH-N-CdoFn3 or SHH-N-IhogFn3 
to the PTCH1*-SHH-N structure. c, Model of the putative collaboration 
between SHH-N-PTCH1 and co-receptors or PTCH1 itself in HH 
signalling. 


determine whether it competes with PTCH1* to bind native SHH-N 
or N-terminally tagged SHH-N. SHH-N was mixed with 5E1 before 
being incubated with PTCH1* immobilized on a Flag M2 resin. A 
PTCH1*-SHH-N-5E1 trimeric complex could be eluted, which 
suggests that binding of 5E1 to native SHH-N does not block its access 
to the observed PTCH1* interface. By contrast, 5E1 successfully 
competed with N-His-tagged SHH-N to bind PTCH1* with and with- 
out Ca?" (Fig. 4d). 

To exclude the possibility that detergent may have had an undesirable 
influence in our system, we also reconstituted PTCH1* with amphipols, 
which stabilize membrane proteins in solution, and repeated the 
competition assays (Extended Data Fig. 8a). The results show 
that, as in the presence of detergents, the PTCH1*-SHH-N-5E] trimeric 
complex could be detected in a detergent-free environment 
(Extended Data Fig. 8b). Therefore, our data suggest that the palmitoylated 
N terminus of SHH-N is an integral part of the native SHH-N-PTCH1* 
interface and that when its palmitoylated N terminus is absent the 
5E1-binding interface (including R153 and the Ca** binding site) may 
dominate in PTCH1* binding (Fig. 4c). 

To verify the secondary interface further, we used unmodified 
SHH-N without any lipidation or tag at the N terminus and further 
blocked the previously reported interface on SHH-N with 5E1. In 
our binding assays, unmodified SHH-N still bound PTCH1* by the 
secondary interface in the presence of 5E1, which blocks the previously 
reported interface (Fig. 4e). We then introduced mutations on helix 
al of SHH-N(I111E/N115K), which abolished binding to PTCH1* 
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(Fig. 4e). The HH reporter assays show that the SHH-N(I111E/N115K) 
mutant-conditioned medium lost more than 70% of activity compared 
with wild-type SHH-N (Fig. 1d). These data may explain why mutation 
of 1111 or N115 leads to HPE-3, possibly by altering how PTCH1 
recognizes SHH-N. We also introduced alanine mutations on the 
ECD-1I(E221A/Y222A/L223A/Y224A) of PTCH1* (shown as AAAA 
and L254A/W256A in Fig. 3e in red). The binding assays show that 
these two mutants have weaker binding to the SHH-N-5E1 complex, 
further supporting our structural observations (Fig. 4f). The HH 
reporter assays also show that in Ptch1~’~ MEFs the full-length PTCH1 
with AAAA mutations can repress HH signalling but cannot recognize 
SHH-N to release the inhibition (Fig. 4g). 

Here, we have reported two structures and related structure-guided 
experiments that together reveal that PTCH1* recognizes native 
SHH-N by two distinct binding sites. The PTCH1*-SHH-N interface 
that we describe here has important implications of how SHH-N 
recognizes and interacts with other proteins in various signalling 
pathways, including Ihog (interference hedgehog), Cdo (cell-adhesion- 
molecule-related, downregulated by oncogenes), Boc (brother of Cdo) 
and Hhip (hedgehog-interacting protein)*. These co-receptors func- 
tion in the recognition and localization of HH in various cell types”®””. 
Previous studies have suggested that HH-N could form a complex 
with Patched and its co-receptor?®3°. The PTCH1*-SHH-N complex 
structure that we determined indeed allows for SHH-N to interact with 
an additional co-receptor to form multivalent complexes (Fig. 5a, b). 
This architecture is corroborated by our finding that the interaction of 
SHH-N with the antibody 5E1, which binds the same area of PTCH1 
as does the SHH-N co-receptors, does not interfere with the formation 
of the PTCH1*-SHH-N complex (Fig. 4d). We propose a PTCH1- 
SHH-N working model: SHH-N initially recognizes PTCH1 through 
its palmitoylated N terminus; subsequently, SHH-N co-receptors 
or another PTCH1 binds SHH-N at a distinct interface to further 
regulate HH signalling (Fig. 5c). This model could provide a possible 
mechanism for how HH-N co-receptors and PTCH1 orchestrate HH 
signalling. The other aspect of Patched signalling—the mechanism for 
inhibition of SMO—remains poorly understood. Further investigations 
are required into how the HH-N ligand affects the putative transport 
activity of Patched. 
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METHODS 


Protein expression and purification. The constructs of human patched 1 were 
cloned into pEG BacMam with a C-terminal Flag tag. The protein was expressed 
using baculovirus-mediated transduction of mammalian HEK-293S GnTI- cells 
(ATCC). The cell lines tested negative for mycoplasma contamination. At 48 h 
post-infection at 37°C, cells were disrupted by sonication in buffer A (20 mM 
Hepes pH 7.5, 150 mM NaCl) with 1 mM phenylmethanesulfonyl fluoride (PMSF), 
10g ml“! leupeptin. After low-speed centrifugation, the resulting supernatant was 
incubated in buffer A with 1% (w/v) n-dodecyl-8-p-maltoside (DDM, Anatrace) 
for 1 h at 4°C. The lysate was centrifuged again and the supernatant was loaded 
onto an anti-Flag M2 affinity column (Sigma). After washing twice, the protein 
was eluted in buffer A with 0.1 mg ml"! Flag peptide, 0.02% DDM, and con- 
centrated. The concentrated protein was purified by Superdex-200 size-exclusion 
chromatography (GE Healthcare) in buffer B (20 mM Hepes pH 7.5, 150 mM 
NaCl and 0.06% (w/v) Digitonin (Sigma)). The peak fractions were collected and 
concentrated to 5-7 mg ml! for grid preparation. Mass spectrometry and anti- 
Flag-tag western blotting confirmed the identity of the protein. To assemble the 
PTCH1*-SHH-N complex, native SHH-N (purchased from R&D system, cata- 
logue number 8908-SH/CF) was mixed with purified PTCH1* ata 1:1 molar ratio 
and purified by Superdex-200 size-exclusion chromatography (GE Healthcare) in 
buffer B. The peak fractions were collected and concentrated to 5-7 mg ml"! for 
grid preparation. 

For preparation of detergent-free protein, PTCH1* was purified as above and 
then mixed with Amphipol A8-35 (Anatrace) at a 1:3 mass ratio for 4h. This mixture 
was incubated with Bio-Beads (Bio-Rad) overnight before further purification 
by gel filtration with buffer A. The mutated and truncated DNA constructs were 
generated using two-step PCR or Gibson assembly (NEB). 

Three constructs were cloned into the pET21b vector: (1) human SHH-N (resi- 
dues 24-197) with N-terminal His tag; (2) human SHH-N (residues 25-197) with 
C-terminal His tag; and (3) human SHH-N (residues 37-197) with C-terminal His 
tag. All of the constructs and SHH-N variants were then transformed into E. coli 
BL21 (DE3) for expression. The transformed bacteria were grown in LB medium 
supplemented with ampicillin at 37°C and induced by 0.2 mM isopropyl B-p-thio- 
galactopyranoside (IPTG) overnight at 25°C. The cells were harvested and lysed 
by sonication in buffer C (20 mM Hepes pH 7.5, 500 mM NaCl) supplemented 
with 1 mM PMSF. The lysate was centrifuged at 18,000 r.p.m. for 30 min and the 
supernatant was loaded onto a Ni?*-NTA affinity column (Qiagen). After washing 
three times with 20 mM, 40 mM and 80 mM imidazole in buffer C, the protein 
was eluted in buffer C plus 250 mM imidazole and further purified by gel filtration 
using Superdex-200 size-exclusion chromatography (GE Healthcare) in buffer 
C. Peak fractions were collected for pull-down assay. 

Electron microscopy sample preparation and imaging. A freshly purified pro- 
tein sample was added to Quantifoil R1.2/1.3 400 mesh Au holey carbon grids 
(Quantifoil), blotted using a Vitrobot Mark IV (FED), and frozen in liquid ethane. 
The grids were imaged in a 300-keV Titan Krios (FEI) with a Gatan K2 Summit 
direct electron detector (Gatan). Data were collected at 1 A per pixel with a dose 
rate of 8 electrons per physical pixel per second. Images were recorded for 10-s 
exposure in 50 subframes to give a total dose of 80 electrons per A?. 

Imaging processing and 3D reconstruction. Dark subtracted images were first 
normalized by gain reference that resulted in a pixel size of 1 A per pixel. Drift cor- 
rection was performed using the program Unblur*?. The contrast transfer function 
(CTF) was estimated using CTFFIND4™. To generate PTCH1* templates for auto- 
matic picking, around 2,000 particles were manually picked and classified by 2D 
classification in RELION™. After auto-picking in RELION, the low-quality images 
and false-positive particles were removed manually. About 790,000 particles were 
extracted for subsequent 2D and 3D classification. 3D classification was carried out 
in RELION to generate the initial model of PTCH1*, using the cryo-EM structure 
of human NPC (Electron Microscopy Data Bank, EMD-6640) low-pass-filtered 
to 60 Aas the initial model. The PTCH1* model of best class after 3D classification 
was used as the initial model for the final 3D classification in RELION. The best 
class, containing around 168,000 particles, provided a 7.7 A map after 3D auto- 
refinement in RELION. Motion correction of all particles was performed using the 
program alignparts_Imbfgs™*. As ina previously published approach*, refinement 
was performed in FREALIGN* using this best class as the initial model. The global 
search was performed once followed by 10-20 rounds of local search without mask. 
The best class without mask refinement was selected to generate the mask using 
relion_mask_create with 6 A extensions excluding the micelle. This mask was then 
used to perform another global search followed by 10-20 rounds of local search 
with a cosine edge width of 6 A and a BSC value of 10 to exclude bad particles. The 
final map is estimated to be 3.5 A using the 0.143 cut-off criteria. 

To generate templates of the PTCH1*-SHH-N complex for automatic picking, 
around 5,000 particles were picked manually and classified by 2D classification 
in RELION**. After auto-picking in RELION, the low-quality images and false- 
positive particles were removed manually. About 661,000 particles were extracted 
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for subsequent 2D and 3D classification. 3D classification was carried out in 
RELION using the cryo-EM structure of PTCH1* low-pass-filtered to 60 A as 
the initial model. The complex model of best class after 3D classification was 
used as the initial model for the final 3D classification in RELION. The best class, 
containing 195,000 particles, provided a 7.1 A map after 3D auto-refinement in 
RELION. After motion correction of individual particles, the final refinement was 
performed in FREALIGN* using this best class as the initial model. The global 
search was performed once followed by 10-20 rounds of local search without mask. 
The best class without mask refinement was selected to generate the mask using 
relion_mask_create with 6 A extensions excluding the micelle. This mask was then 
used for performing another global search followed by 10-20 rounds of local search 
with a cosine edge width of 6 A. The final map is estimated to be 3.8 A using the 
0.143 cut-off criteria. 

Model construction. To obtain better side-chain densities for model building, we 
sharpened the map of PTCH1* using bfactor.exe (author, Nikolaus Grigorieff) with 
a resolution limit of 3.5 A and a B-factor value of —100 A. The entire model was 
built de novo in Coot*”. The crystal structure of human NPC1 (residues 334-1,278, 
PDB code 5U74) and the glycosylation sites of the PTCH1* extracellular domains 
were used to check the registers of our model. The de novo models of ideal helices 
were first put into the transmembrane region. Using the bulky size of some large 
side chains in the cryo-EM map, we assigned a sequence to the initial model. The 
model was refined using phenix.real_space_refine with real-space restraints, includ- 
ing secondary-structure, stereochemical, Ramachandran and rotamer restrains, to 
accommodate the bending of helices, maintain the stereochemistry of the helical 
structure and best fit the model and cryo-EM map. Finally, the two extracellular 
domains were added to the model gradually, at the same time as the sequence 
assignment. The density of residues 1-75 (N-terminal domain), 608-618 and 
721-730 (TM6-TM7 linker), 888-901 (in ECD-II) and 1,177-1,188 (C terminus) 
is not resolved or built. Residues 191-198, 211-263, 379-391, 457-466, 864-887, 
902-915 and 955-960 were built with polyalanine owing to limited local resolu- 
tion. For the complex, we sharpened the map using bfactor.exe with a resolution 
limit of 3.8 A and a B-factor value of —100 A?. The residues 149-153 of PTCH1* 
were also built with polyalanine in the complex. Using the EMfit docking program 
(developed by Michael G. Rossmann), a single solution was obtained for the ori- 
entation of SHH-N in the cryo-EM map. Structures of SHH-N (PDB code 3M1N) 
and PTCH1* were docked into our final cryo-EM maps of the complex in Coot*”. 
Model refinement and validation. The models of PTCH1* and its complex with 
SHH-N were refined in real space using PHENIX®* and in reciprocal space using 
Refmac with secondary-structure restraints and stereochemical restraints*?°. 
Structure factors were calculated from a half-map (working) using the program 
SFall*!. Fourier shell correlations (FSCs) were calculated between the two half 
maps, the model against the working map, the other (free) half map and full (sum) 
map”. Local resolutions were estimated using Blocres*’, MolProbity“* was used 
to validate the geometries of the model. Structure figures were generated using 
PyMOL (http://www.pymol.org) and Chimera*®®. 

Pull-down assay. The unmodified human SHH-N proteins were expressed and 
purified from E. coli as described above. The HEK-293-derived native human 
SHH-N protein was purchased from R&D Systems (catalogue number 8908-SH/CF; 
see the LC/ESI-mass spectrometry analysis of this protein at http://bit.ly/2Ao- 
hYCG). For the pull-down assay, purified PTCH1* protein was immobilized to 
20 pl anti-Flag M2 resin, which was further incubated with unmodified or native 
SHH-N for 1 h at 4°C in 150 j1l buffer B. Then the resin was spun down and washed 
three times with buffer B. The protein complex was eluted with 20 jl buffer B sup- 
plemented with 0.3 mg ml”! Flag peptide. 15 1] of the elution was loaded on SDS- 
PAGE for detection. To see whether 5E1 competes with PTCH1* when binding 
SHH-N, 5E1 (from B. Chen and J. Kim, UT Southwestern) was added to SHH-N 
before incubating with the PTCH1*-immobilized anti-Flag M2 resin. For the pull 
down of the SHH-N-5E1 complex, the SHH-N (E. coli expressed with C-terminal 
His tag) was incubated with 5E1 at 1:1 molar ratio, and then the pull-down assays 
were performed as above. The SHH-N protein and PTCH1* were detected by 
anti-SHH antibody (sc-365112, Santa Cruz Biotechnology) and anti-Flag antibody 
(M185, MBL Life Science). For the detergent-free assay, buffer B was replaced by 
buffer A. Each assay was reproduced at least three times. 

HH reporter assays. Human SHH-N (24-197) was constructed into pcDNA3.1 
vector with the signal sequence of human calreticulin at the N terminus as 
described previously'!. Secreted SHH-N was produced in HEK-293 cells (ATCC) 
by transient transfection for 72 h and was collected in DMEM, with 0.5% fetal 
bovine serum (FBS). SHH Light II cells, a stable cell line expressing firefly lucif- 
erase with a 8X-Gli promoter and Renilla luciferase with a constitutive promoter 
(from B. Chen and J. Kim), were used to measure HH pathway activity. SHH Light 
II cells were treated with the conditioned medium or purified protein diluted in 
fresh DMEM with 0.5% newborn calf serum for 30 h. To measure the activity 
of PTCH1 variants in HH signalling, the 8X-Gli luciferase firefly reporter trans- 
gene, a constitutive Renilla luciferase transgene and a pcDNA3.1 vector encoding 
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PTCHI variants were transfected to Ptch1~/~ MEFs (from B. Chen and J. Kim) 
using TransIT reagent (Mirus Bio LLC). After 24 h, cells were serum-starved in 
DMEM with 0.5% FBS. 24 h later, cells were treated with SHH-N conditioned 
medium for another 24 h. Firefly and Renilla luciferase were measured using the 
Dual-Luciferase Reporter Assay System (Promega). The conditioned medium 
added was normalized on the basis of western blotting with anti-SHH antibody. 
The expression of PTCH1 variants and internal calnexin in MEFs were detected by 
western blotting with anti-PTCH1 antibody (GeneTex, 83771) and anti-calnexin 
antibody (Novus, NB100-1965). Each assay was reproduced at least three times 
and data were analysed using Excel (Microsoft). Bar graphs were generated by 
Prism (GraphPad). The cell lines tested negative for mycoplasma contamination. 
Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. The 3D cryo-EM density maps of PTCH1* and PTCH1*- 
SHH-N have been deposited in the Electron Microscopy Data Bank under acces- 
sion numbers EMD-7795 and EMD-7796. Atomic coordinates for the atomic 
model of PTCH1* and the PTCH1*-SHH-N complex have been deposited in the 
Protein Data Bank under accession numbers 6D4H and 6D4J. All other data are 
available from the corresponding authors on reasonable request. 
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Extended Data Fig. 1 | Sequence alignment of human PTCH1 and transmembrane helices and secondary structures of extracellular domains 
PTCH2, mouse PTCH1 and Drosophila Ptc. The residue numbers are labelled (structural elements of ECD-II with asterisk). Residues under 
of human PTCH1 are indicated above the protein sequence. The the dashed lines are excluded from the 3D reconstruction. 
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Extended Data Fig. 2 | Biochemical properties of expressed human purified PTCH1*-SHH-N complex (c). Molecular standards are indicated 
PTCH1 proteins. a—c, Size-exclusion chromatogram and SDS-PAGE gel on the left side of the gels and above the elution curves. The assays were 
of the purified full-length PTCH1 (a), the purified PTCH1* (b) and the reproduced at least three times with similar results. 
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Extended Data Fig. 3 | Data processing and model quality assessment 
of PTCH1*. a, The data processing workflow for PTCH1*. b, A 
representative electron micrograph at a defocus of —2.0 jm. ¢, 2D 
classification. d, FSC curve of the structure as a function of resolution 


using Frealign output. e, The FSC curves calculated between the refined 
structure and the half map used for refinement, the other half map and 
the full map. f, Density maps of PTCH1* structure coloured by local 
resolution estimate using Blocres. 
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Extended Data Fig. 4 | Electron microscopy density of different portions of PTCH1* at the 5a level. a, TM1-TM6. b, TM7-TM12. c, ECD-I. 
d, ECD-II. NAG, N-acetylglucosamine. 
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Extended Data Fig. 5 | NPC1 and PTCH1* SSD structural and surface comparison. a, NPC1 SSD. The putative pocket (indicated by the red arrow) 
in the SSD is created by TM3-TM5. b, PTCH1* SSD. 
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Extended Data Fig. 6 | Data processing and model quality assessment of PTCH1*-SHH-N. a-f, Same as Extended Data Fig. 3 but for PTCH1*-SHH-N. 
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Extended Data Fig. 7 | Electron microscopy density of different 4.5a level, d, Major structural elements of ECD-II at the 4.50 level. 
portions of PTCH1*-SHH-N complex. a, TM1-TM6 at the 5o level. e, Major structural elements of SHH-N at the 4.5a level; palmitate (PLM) 


b, TM7-TM12 at the 5o level. c, Major structural elements of ECD-Iatthe at the 3a level. 
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Extended Data Fig. 8 | PTCH1*-SHH-N binding assay in the detergent- 
free system. a, Size-exclusion chromatogram and SDS-PAGE gel of the 
purified PTCH1* with Amphipol A8-35 in buffer A. Molecular standards 
are indicated on the left side of the gels and above the elution curves. 

b, 5E1 does not compete with the binding of native SHH-N to PTCH1*. 


16 
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Ptch1* + + + 
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Ptchi* 5E1 bl 


‘Shh-N 


5E1 and SHH-N ata 1:1 molar ratio were incubated with PTCH1*- 
immobilized Flag M2 resin; the complex was eluted by Flag peptide. 
Protein was detected by Coomassie staining. The assay was reproduced 
three times with similar results. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Extended Data Table 1 | Cryo-EM data collection, refinement and validation statistics 


Ptch1* Ptch1*-Shh-N 
(EMDB-7795) (EMDB-7796) 
(PDB 6D4H) (PDB 6D4J) 
Data collection and processing 
Magnification 29000 29000 
Voltage (kV) 300 300 
Electron exposure (e—/A’) 80 80 
Defocus range (um) -0.8 to -2.0 -0.8 to -2.2 
Pixel size (A) 1.0 1.0 
Symmetry imposed Cl Cl 
Initial particle images (no.) 789,118 661,119 
Final particle images (no.) 167,840 195,051 
Map resolution (A) 3.48 3.80 
FSC threshold 0.143 
Map resolution range (A) 3.1-7.0 3.5-7.0 
Refinement 
Initial model used (PDB code) 3JD8 6D4H 
Model resolution (A) 3.90 4.14 
FSC threshold 0.5 
Model resolution range (A) 3.90-256 4.14-256 
Map sharpening B factor (A*) -100 -100 
Model composition 
Non-hydrogen atoms 7318 8614 
Protein residues 964 1129 
Ligands 8 8 
B factors (A?) 
Protein 127.8 184.9 
Ligand 177.2 218.2 
R.m-s. deviations 
Bond lengths (A) 0.0069 0.0063 
Bond angles (°) 1.1545 1.0467 
Validation 
MolProbity score 2.09 2.05 
Clashscore 4.21 2.89 
Poor rotamers (%) 3.12 3.69 
Ramachandran plot 
Favored (%) 91.2 90.2 
Allowed (%) 8.8 9.8 
Disallowed (%) 0 0 
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You are not a failed scientist 


PhD students who leave academia should be supported, says Philipp Kruger. 


sa PhD student in my final year, I find 
A it demoralizing and frustrating to be 
constantly reminded of the bleak job 
prospects in academia. This dim outlook may 
well increase the pressure on students and 
contribute to high rates of anxiety and depres- 
sion among them (T. M. Evans et al. Nature 
Biotechnol. 36, 282-284; 2018). The scientific 
community could help to solve this problem 
by encouraging us all to change the way we 
think about the PhD. And scientists can start 
by appreciating a simple truth: researchers who 
leave academia are not failed academics. 
Students and their supervisors must begin 
to regard a PhD programme asa traineeship in 
scientific thinking and an invaluable qualifica- 
tion for a diverse range of careers. If everyone 
involved in academic science could accept 
a variety of roles as the default outcome, we 


could change our flawed definition of success. 
We could transition from a culture of failure 
to a healthier and happier scientific enterprise. 

I've found it daunting to determine the 
best career to match my personality, skills, 
priorities, ambitions and interests, particu- 
larly because most people around me treat 
the academic path as the default. But we PhD 
students have an obligation to see to our own 
professional futures. Many of us, of course, are 
driven by the excitement of discovery, and we 
relish the freedom to pursue our curiosity in 
an academic laboratory. 

However, some of us discover during our 
PhDs that in our dream job, the emphasis 
would be on using interpersonal and com- 
munication skills, having a more immediate 
impact on society or gaining financial rewards, 
job security or family-friendly working 


hours. Our direction should be the result of 
a conscious decision rather than a perception 
ofa lack of opportunities. And it should have 
nothing to do with a sense or fear of ‘failure. 


WHAT PHD STUDENTS CAN DO 

To make an informed decision about their 
professional future, students need to use 
their initiative and be proactive. Here’s what 
has worked for me and what I think my student 
peers could do. 


Determine your preferences. A PhD provides 
you with diverse experiences, from conducting 
lab experiments to giving talks and supervising 
students. I’ve found it useful to write down 
whenever one of these tasks is particularly 
fascinating, enjoyable, stressful, boring or 
frustrating. Doing this over four years has > 
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> helped me to determine patterns and decide 
which tasks should make up my future job. For 
example, this is how I discovered that I enjoy 
discussing and presenting my data much more 
than actually collecting them in the lab. 


Develop a range of skills. It was important 
for me to be aware of and actively develop 
transferable skills, such as teamwork, line 
management of more-junior students, pro- 
ject and time management, written commu- 
nication and conference presentations, from 
an early point in my PhD. You could seek 
activities outside the lab that provide fur- 
ther experience, such as organizing events, 
teaching, public engagement or writing a 
blog. I have, for example, co-organized con- 
ferences in Oxford for the British Society for 
Immunology, and I’m teaching immunology 
in various undergraduate programmes at the 
University of Oxford, UK. 


Find out what’s out there. This is not as 
straightforward as it should be, but I found 
that my university’s careers service was a 
good place to start. Career advisers regularly 
encounter PhD students with doubts about 
their future roles, and they can help you to 
structure your self-assessment and broadly 
match your preferences and skills to specific 
roles. I also read job adverts to find out about 
available positions and the skills that employ- 
ers seek. Another useful resource is science- 
careers fairs, where you can meet potential 
employers. Even after a lot of research online, I 
encountered jobs at those fairs that I had never 
heard of. 


Organize talks. Dont depend solely on online 
articles and blog posts to understand what a 
certain role is really like. For me, the most 
substantial insights have come from talking to 
people who actually have those jobs. I’ve found 
that those who have decided to leave the lab for 
other professions know how difficult it is and 
are willing to give talks about their journey. 
Initiation of such careers seminars often comes 
from students, and you could even organize 
one yourself. In my department, we plan four 
of those talks each year. Send this article to 
your department or programme director when 
you ask them to fund your event. 


Build your network. Early on, I found that my 
LinkedIn profile was a helpful tool for keep- 
ing track of connections and providing infor- 
mation about myself. If you send follow-up 
messages to people you have spoken to at semi- 
nars, meetings or fairs, they are more likely to 
remember you when you ask them for advice 
or work experience. For example, multiple 
career-seminar speakers have sent me details 
afterwards about their job-search strategy, 
companies they'd considered and more. 


Arrange an internship. Use your (or your 
supervisor’s) network, or cold-e-mail 


people and ask them for internship or work- 
shadowing opportunities. In February, I spent 
one week at Nature Immunology (after meeting 
an editor at a conference), and this was enough 
to get a good understanding of day-to-day life 
in the job. I was involved in discussing article 
submissions, wrote a commentary and iden- 
tified appropriate reviewers for a paper. In 
addition, I spoke to people in the company and 
learnt about their roles: for example, I found 
out the differences between primary-journal 
and review-journal editors. 


WHAT SUPERVISORS CAN DO 

A free and open exploration of different 
career options requires a work environment 
that encourages students to take out time 
for career exploration and that actively helps 
them to find the relevant information. How- 
ever, most supervi- 


sors arguably know “Researchers 
little about non-aca- wholeave 
demic careers and academia 
mightnothavemuch are not failed 


incentive to provide academics.” 
students with access 

to such information. As a consequence, only 
one-third of the respondents in Nature’s 2017 
Graduate Student Survey reported receiving 
useful advice from their supervisor regarding 
non-academic careers (see https://go.nature. 
com/2qwsyfx). 

When I was involved in organizing the 
2016 Medical Sciences Careers Day for PhD 
students at the University of Oxford, some 
suggested also inviting academic speakers to 
provide a more ‘balanced picture’. But PhD 
students already have plenty of role models in 
their academic environment and solid access 
to information about the academic career 
track. And the main purpose of events such 
as this is to provide information that students 
don't have access to, and thereby correct the 
already existing imbalance. 

PhD students alone are unlikely to 
transform the culture of academic science. 
I think that PhD programmes and individual 
supervisors have a responsibility to make it 
easy for students to find the path that is right 
for them, and I know many group leaders who 
see this as part of their role as a mentor. Asa 
principal investigator (PI), you can take several 
steps to support your students. 


Support students’ professional develop- 
ment. Offer students opportunities to teach, 
supervise, chair meetings, manage collabora- 
tions and write papers or grant applications. 
In my experience, this is already widely prac- 
tised, but some PIs do not always appreciate 
the importance of these training opportunities 
to students. I learnt the most when I had to 
develop a certain skill — say, doing a practice 
run for a major presentation — and received 
feedback. You could formalize this into a reg- 
ular meeting and discuss what the students 
enjoy and where they need more practice. 
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Establish a flexible work environment. The 
students I have met who had developed a clear 
career plan by the end of their PhD were 
usually from research groups that cultivated a 
flexible, low-pressure environment. Encourage 
students to take time out for extracurricular 
activities that will provide them with useful 
experiences. I have been fortunate to have had 
the opportunity to try out different things, but 
I know others who feel pressured to be in the 
lab all day, every day. 


Encourage work experience. Some formal 
PhD programmes include an option, even 
a requirement, to do an internship outside 
academia, and I think this is a great idea. 
Friends of mine have spent three months 
drafting policy at the World Health Organi- 
zation or filming science documentaries for 
the BBC. Everyone I know who has done an 
internship greatly valued the experience, 
and individual supervisors can support this 
— and let their students know that they do 
— by, for example, providing contacts in rel- 
evant sectors. 


Show your support. Arguably the most 
important thing that supervisors can do is 
openly show support for lab members and 
mentees who might be heading out of aca- 
demia. For example, you could say on your 
lab page that it is important to you to support 
your lab members’ professional development. 
You could supplement the statement with an 
up-to-date list of lab alumni to show that you 
are proud of their achievements. Potential PhD 
and postdoc applicants will greatly appreciate 
your stand on this issue. 


APHD IS HIGHLY VALUABLE 

This leaves us with one last aspect of the culture 
of failure and its effect on doctoral students 
and postdocs: the widespread misconception 
that a PhD is useful training only for academic 
research. Or, in other words, if you leave aca- 
demia, your mum will think that you've wasted 
your time doing a PhD. You might even have 
wondered about that yourself. 

We know that most PhD graduates 
eventually go on to other careers, but have they 
all wasted their time? Absolutely not. The skills 
you are acquiring (or have acquired) during a 
PhD are highly sought by employers beyond 
academic science. You are incredibly resilient, 
hard-working and motivated. You make deci- 
sions based on evidence, you can interpret 
data, you can communicate complex con- 
cepts clearly, you are an effective team player 
and you can prioritize tasks. And you have a 
degree to prove all of this. 

You have every reason to be positive about 
your job prospects. 

Personally, I won't regret having done my 
PhD, regardless of my future career. m 


Philipp Kruger is a PhD student in 
immunology at the University of Oxford, UK. 
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THE TAIL OF DANNY WHISKERS 


BY FAWAZ AL-MATROUK 


en miles from the Canadian border, 
| they were stopped by police. 

“Don't say anything,” said Dr 
Tarboush, his hands turning red against the 
steering wheel. 

“You were driving like a maniac,” said the 
cat in the carrier box on the passenger seat. 
His name was Danny Whiskers. 

Dr Tarboush could feel his heartbeat in 
his hands as he watched the police officer 
approach in the rear-view mirror. 

“You think I’m stupid?” said Danny. “Pm 
not stupid.” 

“Shush!” 

“IT know when to be quiet.” 

“Hello officer,’ said Dr Tarboush. 

The police officer kept his distance. 
“Licence and registration please.” 

Dr Tarboush fumbled through the glove 
compartment. It overflowed. Receipts, 
envelopes, pages from a scientific paper on 
the epigenetics of intelligence. 

“Hang on, said the doctor, searching 
through papers. “I have it somewhere.” 

“Goodness,” whispered Danny. 

“Shush!” 

“You ll get us shot” 

“You're the talking cat!” 

The officer heard him. “Excuse me?” 

“Sorry, officer, just talking to myself. Found 
it!” Dr Tarboush produced the registration. 

The officer twisted his brow. “Hang tight.” 

Dr Tarboush watched the officer get smaller 
in the rear-view mirror. Beads of sweat col- 
lected on his bushy eyebrows. “He's onto us.” 

“Great. Ten miles from Canada and you'll 
have me killed” 

“You're the one who cart keep quiet.” 

“You're the one swerving across lanes.” 

“That’s because my cat decided to break 
out of his carrier box and shout obscenities,” 
— he was loud now — “at my fellow drivers!” 

“Do they not understand what a fast lane is?” 

“Do you not understand what a fugitive is?” 

“T understand. I’m not stupid” 

“T get ten years for unauthorized experi- 
ments in genetic engineering.” 

“[know.’ 

“You get to learn how many ways there are 
to skin one of you.” 

“Stop.” 

“T should have left you in the lab and read 
about you in the Vancouver Times, or what- 
ever they have up there, sipping chamomile 
tea. “Talking cat!’ “Freak experiment!’ ‘Puss 
gives heartfelt monologue as federal science 
police put him to...” Are you crying?” 
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It’s not easy being smart. 


Danny was distinctly crying. “No.” 

“Stop crying.” 

“Tt’s not fair? he mumbled. “I didn't ask for 
this. I didn’t ask to be smart. I was legal justa 
week ago. Why did they have to change the 
laws? Why can’t they just let me be?” 

Dr Tarboush squirmed in the driver's seat. 
He was never good at comforting. 

“T don't know, Danny,’ 

“Tt’s not fair” 

“No, it’s not ... People get scared and make 
laws and they ruin other people's lives and 
they don't understand how.” 

Danny sniffled. “Other people's lives?” 

“Yeah” 

“You mean I’m other people?” 

“Of course you are.” 

Danny let himself smile. “Smarter than 
human people, though” 

“Shove it.” 

Danny laughed. Dr Tarboush stuck his 
fingers through the metal grating of the 
carrier box. “Come here.” 

Danny leant into the fingers, accepting the 
neck rubs. He began to purr. 

“That's a good boy,’ said Dr Tarboush. 

“Dont patronize me,’ said Danny, purring. 

Heel clicks announced the officer’s return. 
Dr Tarboush sat up straight. The officer leant 
into the car, a yellow ticket between his fin- 
gers. He flung it onto the pile of papers. 

“Figured that’s where you want it,” he said. 

“Sure.” 

“T's for changing lanes without a signal.” 

Dr Tarboush nodded. “Right” 

“That your cat?” 

“Yeah? 

“Cute? 

Danny groaned in his carrier box. Dr 
Tarboush felt his heart sink into his belly. 

The officer smiled. “You know, there’s people 
doing weird experiments on animals. Changing 
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their genes and such. Something called crispy:’ 

Danny scoffed. 

Dr Tarboush scoffed to cover it up. 

“You scoffing at me?” 

“No, no, officer. At crispy,’ his voice broke. 
“Tt’s a stupid idea” 

The officer narrowed his eyes. “Why stupid?” 

Dr Tarboush cleared his throat. “It's against 
God's plan, you know. There ought to be alaw.’ 

“There is a law. 

“Ts there?” 

“Serious law. Serious time. There’s two 
fugitives from it now. An old man and his 
cat, last seen heading north from California. 
Talkative cat. Described as ‘uppity”” 

“Uppity!” It was Danny Whiskers. 

Dr Tarboush slammed his forehead 
against the steering wheel. 

The officer said: “That was the cat.” 

“Sir... 

“Sounded like the cat? 

“Tma ventriloquist, sir. Ican throw voices.” 

“You're a ventriloquist; the officer laughed. 
“Dr Taha Tarboush. PhD in ventriloquy. Or is 
it ventriloquation? I don’t know. I don't care. 
Back to the subject. You say it’s against God’s 
plan to mess with genetics. I say,’ he leant in 
to whisper, “that my dad has Alzheimer’s, and 
I'm hoping for a breakthrough. Godspeed” 

With that, the officer walked away. Dr 
Tarboush could feel his heartbeats, six to 
every footstep. “But don't speed!” The officer 
called from his cruiser. 

“What just happened?” asked Danny 
Whiskers. 

“T think we got lucky” 

“T think truth won out” 

“Yeah? 

“Tm truth,” said Danny. “I won out.” 

Dr Tarboush turned to his passenger seat. 
“We have ten miles to the border, Danny. 
One more sound, and you're a stray.” 

“What about a purr?” 

“Danny,” 

“Meow?” 

“Shut up” 

“What if my tail falls asleep? Can I let 
you know? Can I leave my box? What if it 
itches? What if I need a scratch? Scratching’s 
a sound. Can I make a scratch?” 

They drove ten miles to the border with- 
out a moment of silence between them. 
Shortly after they entered British Columbia, 
Danny Whiskers fell asleep. m 


Fawaz Al-Matrouk loves the latest tech, 
but writes with pen and paper. He is a film 
director by trade, currently developing a 
feature debut with the support of SFFILM. 
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